Wednesday, March 31, 2021

Never use environment variables for configuration

Suppose you need to create a function for adding two numbers together in plain C. How would you write it? What sort of an API would it have? One possible implementation would be this:

int add_numbers(int one, int two) {

return one + two;

}

// to call it you'd do

int three = add_numbers(1, 2);

Seems reasonable? But what if it was implemented like this instead:

int first_argument;

int second_argument;

void add_numbers(void) {

return first_argument + second_argument;

}

// to call it you'd do

first_argument = 1;

second_argument = 2;

int three = add_numbers();

This is, I trust you all agree, terrible. This approach is plain wrong, against all accepted coding practices and would get immediately rejected in any code review. It is left as an exercise to the reader to come up with ways in which this architecture is broken. You don't even need to look into thread safety to find correctness bugs.

And yet we have environment variables

Environment variables is exactly this: mutable global state. Envvars have some legitimate usages (such as enabling debug logging) but they should never, ever be used for configuring core functionality of programs. Sadly they are used for this purpose a lot and there are some people who think that this is a good thing. This causes no end of headaches due to weird corner, edge and even common cases.

Persistance of state

For example suppose you run a command line program that has some sort of a persistent state.

$ SOME_ENVVAR=... some_command <args>

Then some time after that you run it again:

$ some_command <args>

The environment is now different. What should the program do? Use the old configuration that had the env var set or the new one where it is not set? Error out? Try to silently merge the different options into one? Something else?

The answer is that you, the end user, can not now. Every program is free to do its own thing and most do. If you have ever spent ages wondering why the exact same commands work when run from one terminal but not the other, this is probably why.

Lack of higher order primitives

An environment variable can only contain a single null-terminated stream of bytes. This is very limiting. At the very least you'd want to have arrays, but it is not supported. Surely that is not a problem, you say, you can always do in-band signaling. For example the PATH environment variable has many directories which are separated by the : character. What could be simpler? Many things, it turns out.

First of all the separator for paths is not always :. On Windows it is ;. More generally every program is free to choose its own. A common choice is space:

CFLAGS='-Dfoo="bar" -Dbaz' <command>

Except what if you need to pass a space character as part of the argument? Depending on the actual program, shell and the phase of the moon, you might need to do this:

ARG='-Dfoo="bar bar" -Dbaz'

or this:

ARG='-Dfoo="bar\ bar" -Dbaz'

or even this:

ARG='-Dfoo="bar\\ bar" -Dbaz'

There is no way to know which one of these is the correct form. You have to try them all and see which one works. Sometimes, as an implementation detail, the string gets expanded multiple times so you get to quote quote characters. Insert your favourite picture of Xzibit here.

For comparison using JSON configuration files this entire class os problems would not exist. Every application would read the data in the same way, because JSON provides primitives to express these higher level constructs. In contrast every time an environment variable needs to carry more information than a single untyped string, the programmer gets to create a new ad hoc data marshaling scheme and if there's one thing that guarantees usability it's reinventing the square wheel.

There is a second, more insidious part to this. If a decision is made to configure something via an environment variable then the entire design goal changes. Instead of coming up with a syntax that is as good as possible for the given problem, instead the goal is to produce syntax that is easy to use when typing commands on the terminal. This reduces work in the immediate short term but increases it in the medium to long term.

Why are environment variables still used?

It's the same old trifecta of why things are bad and broken:

Envvars are easy to add
There are existing processes that only work via envvars
"This is the way we have always done it so it must be correct!"

The first explains why even new programs add configuration options via envvars (no need to add code to the command line parser, so that's a net win right?).

The second makes it seem like envvars are a normal and reasonable thing as they are so widespread.

The third makes it all but impossible to improve things on a larger scale. Now, granted, fixing these issues would be a lot of work and the transition would unearth a lot of bugs but the end result would be more readable and reliable.

29 comments:

stephensongMarch 31, 2021 at 9:25 PM
err, methinks you missed the main point of why envvars are used:

...
4: To prevent secret values appearing in source code, and thereby being published eg. via commits.
...

And do you have a better solution for this, the real use case?
ReplyDelete
Replies
,cteApril 1, 2021 at 3:38 AM
I'll Just leave this here :
https://github.com/ninja-build/ninja/issues/1482
ReplyDelete
Replies
Jean-Christophe "Jeko" HoeltApril 1, 2021 at 9:51 AM
Configuration files are just 1 big global variable (stored on disk instead of memory). The exact same issues raised above still apply.
ReplyDelete
Replies
UnknownApril 1, 2021 at 9:56 AM
arrg, yes. Just today I was using hugo (the static site generator) and I was surprised that in order to deploy to azure/aws I needed to set the container and the key in env variables. And this was not from hugo, but from the go library that was using. What if I want to deploy to multiple containers? Setting envvars for every command in another bash seems hacky rather than storing the conf in a file (like the other hugo settings).
ReplyDelete
Replies
JussiApril 1, 2021 at 10:25 AM
PATH is one of the things that works just fine as an environment variable.
ReplyDelete
Replies
LainoApril 1, 2021 at 3:07 PM
"Environment variables is exactly this: mutable global state."

No they're absolutely not.
ReplyDelete
Replies
UnknownApril 1, 2021 at 4:16 PM
PATH comes from the shell that spawns all the child processes. If we need to abandon ideas like PATH, we also needs to abandon ideas like self, as in, how can I be sure that I am the person running my scripts? What if I am actually another human sitting in my chair? I can't prove I am me.

So it becomes pragmatic that we must have an initial set of things we can rely upon, and PATH variables are a-ok in my book because I can't handle an existential crisis that escapes into my shell configuration.
ReplyDelete
Replies
Kanat BektApril 1, 2021 at 4:23 PM
April fools?
ReplyDelete
Replies
Henrique VicenteApril 1, 2021 at 5:15 PM
I completely agree with you. Please ignore the trolls who believe they know everything. Using environment variables for configuration has many downsizes, and unfortunately the situation seems out of control nowadays...

I wrote about this very thing a few months ago: https://henvic.dev/posts/env/
ReplyDelete
Replies
LeaddevApril 1, 2021 at 5:52 PM
Global state is unavoidable.
We just need to make sure those are not mutable.
I would always prefer big list of env variables in containers.
ENV variables can be passed as env file or Kubernetes secret in almost all languages and platform. No need to over engineer and call a remote secret while starting containers.
For reviewing, you can use the .env file or just run set bash command during development.
$ docker run --env-file ./env.list ubuntu bash
ReplyDelete
Replies
GRSApril 1, 2021 at 6:26 PM
Problem with env is that once a program is running you can’t change envvars to modify the behaviour. That’s is if your program was designed to reload on config change
ReplyDelete
Replies
Nicholas SweetingApril 1, 2021 at 7:53 PM
The title should be "don't sprinkle environment variable reads all over your codebase, and don't invent your own serialization formats", not "don't use environment variables".

Environment variables are so much better than the alternative. In the past, programs frequently invented their own configuration loading systems, but over the last few years containerization has strongly nudged most programs towards accepting env vars. The result has been a better, more consistent, and less surprising config experience for everyone, even non-container users.
ReplyDelete
Replies
UnknownApril 1, 2021 at 8:46 PM
Environment variables are superior to file based configurations in many cases for the exact reason you are recommending against their use.

- Variations in config and runtime behavior depending on the *environment* - i.e. in local, dev and qa I want to disable sending emails or SMS messages to actual users. Perfect use case for an environment variable, like "OVERRIDE_PHONE_NUMBER=1111111111". This way I can test the system, override the actual phone number that's used, depending on the environment.

- As has already been mentioned, secrets. Storing these in config files is a bad idea. Vaults etc... may be better in some cases, but then you deal with the complexity of the chain of secrets necessary to access the vault. At some point there's something you need to store in a file or env variable. File is a bad idea.

- File based configs are fine for certain things, but run a high risk of getting added to SCC. In some cases this is fine, but most folks (me included) will admit to succumbing to the temptation of adding Twilio keys (or similar) to a file config.

- Anyone doing container based development lean on ENV variables quite a bit. This is a feature. Environment variables enable this.

- "Never" is a strong term. As you've seen in the comments on this post, there are several good, practical and solid reasons where and when you may want to use environment variables for configuration. It only takes one to ruin your thesis.
ReplyDelete
Replies
codemacApril 1, 2021 at 9:22 PM
Plan9 largely got rid of PATH with bind mounting everything to /bin. Then shells just had a simple implementation (look for file in /bin) and the filesystem took care of any overlaps.

You could also hypothetically have a file named "$HOME/.pathrc" that is the list of paths, and in what order, you want all programs that launch programs to use (almost exactly like your .bashrc doing export PATH=blah today).

There are tons of these types of tradeoffs, and none require environment variables. It is rare that someone uses an environment variable well, especially if it doesn't relate to a user's interactive shell session.
ReplyDelete
Replies
PruneApril 6, 2021 at 9:45 PM
Maybe ENVVARS are not great, but JSON config files are not much better... YAML then ?
What are your thoughts on that ?
ReplyDelete
Replies

Add comment