Wednesday, March 31, 2021

Never use environment variables for configuration

Suppose you need to create a function for adding two numbers together in plain C. How would you write it? What sort of an API would it have? One possible implementation would be this:

int add_numbers(int one, int two) {
    return one + two;
}

// to call it you'd do
int three = add_numbers(1, 2);

Seems reasonable? But what if it was implemented like this instead:

int first_argument;
int second_argument;

void add_numbers(void) {
    return first_argument + second_argument;
}

// to call it you'd do
first_argument = 1;
second_argument = 2;
int three = add_numbers();

This is, I trust you all agree, terrible. This approach is plain wrong, against all accepted coding practices and would get immediately rejected in any code review. It is left as an exercise to the reader to come up with ways in which this architecture is broken. You don't even need to look into thread safety to find correctness bugs.

And yet we have environment variables

Environment variables is exactly this: mutable global state. Envvars have some legitimate usages (such as enabling debug logging) but they should never, ever be used for configuring core functionality of programs. Sadly they are used for this purpose a lot and there are some people who think that this is a good thing. This causes no end of headaches due to weird corner, edge and even common cases.

Persistance of state

For example suppose you run a command line program that has some sort of a persistent state.

$ SOME_ENVVAR=... some_command <args>

Then some time after that you run it again:

$ some_command <args>

The environment is now different. What should the program do? Use the old configuration that had the env var set or the new one where it is not set? Error out? Try to silently merge the different options into one? Something else?

The answer is that you, the end user, can not now. Every program is free to do its own thing and most do. If you have ever spent ages wondering why the exact same commands work when run from one terminal but not the other, this is probably why.

Lack of higher order primitives

An environment variable can only contain a single null-terminated stream of bytes. This is very limiting. At the very least you'd want to have arrays, but it is not supported. Surely that is not a problem, you say, you can always do in-band signaling. For example the PATH environment variable has many directories which are separated by the : character. What could be simpler? Many things, it turns out.

First of all the separator for paths is not always :. On Windows it is ;. More generally every program is free to choose its own. A common choice is space:

CFLAGS='-Dfoo="bar" -Dbaz' <command>

Except what if you need to pass a space character as part of the argument? Depending on the actual program, shell and the phase of the moon, you might need to do this:

ARG='-Dfoo="bar bar" -Dbaz'

or this:

ARG='-Dfoo="bar\ bar" -Dbaz'

or even this:

ARG='-Dfoo="bar\\ bar" -Dbaz'

There is no way to know which one of these is the correct form. You have to try them all and see which one works. Sometimes, as an implementation detail, the string gets expanded multiple times so you get to quote quote characters. Insert your favourite picture of Xzibit here.

For comparison using JSON configuration files this entire class os problems would not exist. Every application would read the data in the same way, because JSON provides primitives to express these higher level constructs. In contrast every time an environment variable needs to carry more information than a single untyped string, the programmer gets to create a new ad hoc data marshaling scheme and if there's one thing that guarantees usability it's reinventing the square wheel.

There is a second, more insidious part to this. If a decision is made to configure something via an environment variable then the entire design goal changes. Instead of coming up with a syntax that is as good as possible for the given problem, instead the goal is to produce syntax that is easy to use when typing commands on the terminal. This reduces work in the immediate short term but increases it in the medium to long term.

Why are environment variables still used?

It's the same old trifecta of why things are bad and broken:

  1. Envvars are easy to add
  2. There are existing processes that only work via envvars
  3. "This is the way we have always done it so it must be correct!"
The first explains why even new programs add configuration options via envvars (no need to add code to the command line parser, so that's a net win right?).

The second makes it seem like envvars are a normal and reasonable thing as they are so widespread.

The third makes it all but impossible to improve things on a larger scale. Now, granted, fixing these issues would be a lot of work and the transition would unearth a lot of bugs but the end result would be more readable and reliable.

30 comments:

  1. err, methinks you missed the main point of why envvars are used:

    ...
    4: To prevent secret values appearing in source code, and thereby being published eg. via commits.
    ...

    And do you have a better solution for this, the real use case?

    ReplyDelete
    Replies
    1. Put them in configuration files in known locations, the way e.g. ssh, gpg and Github tokens work. Or, alternatively, do "your-command --args --password-file=secrets.txt".

      But passing key material to programs is one of the cases where envvars _might_ be ok because it is not configuration in the sense that "choose whether the program does A or B or C".

      Delete
    2. There are lots of occasions where the same "binary" software needs to be configured differently for different users, different machines, etc.

      You can put your environment specific details into a file, into environment variables, or into a script that puts them on the command line. They're all different types of "mutable state" (although I don't see that this is a useful description), and they are ultimately stored on the disk.

      As far as I can tell, complaining that environment variables are worse than the other methods boils down to saying that you don't like the environment variable API (the "set" or "setx" commands, or control panels, etc.).

      Delete
    3. this is how most people use them it seems unsafe to use em in any other way.

      Delete
    4. Secrets are literally the only reason I use envvars but I'm transitioning to use secret managers instead (AWS Secret Manager) in my case.

      Delete
    5. An .env is a configuration file... It is like a standard for "configuration files." If you want to make life simpler, then rail against bloated complex software and non-stop barrage of bikeshedding of engineering paradigms.

      Delete
    6. You can also pass file to be converted to env variable. Don’t overengineer this things
      $ docker run -e MYVAR1 --env MYVAR2=foo --env-file ./env.list ubuntu bash

      Delete
    7. Env vars are really no good mechanism to pass secrets. It's very easy for them to leak i.e. they are visible in process viewers. Dedicated secret services are better, but even then you'll need a secret to access the secret service. So env var again? With a recent systemd even this isn't necessary: https://www.freedesktop.org/software/systemd/man/systemd.exec.html#Credentials

      Note that it requires a fairly recent systemd (in Debian terms).

      Delete
  2. I'll Just leave this here :
    https://github.com/ninja-build/ninja/issues/1482

    ReplyDelete
  3. Configuration files are just 1 big global variable (stored on disk instead of memory). The exact same issues raised above still apply.

    ReplyDelete
    Replies
    1. Simply having it in a file in one place makes it a lot easier to review and validate (with a schema validator even if you've got one). With envvars you can't do that because they are distributed all over the place and can be set or altered at any time by anyone without a trace.

      Delete
    2. That's just from the PoV of which programs you know the best. If you don't know a program, then you don't know much about its configuration file. If you do know a program - like docker-compose - then you know certain things can be set in its root path .env... Which is nicer than "configuration files" in some ways: it is more standard.

      Delete
    3. Any sensible programmer still has a central environment variable location. For development, this might be a `.env` file (so...exactly the same as a JSON file, just a set of KV pairs instead of a JSON object). In production, this might be a console screen on your deployment software.

      The point here is that if you're setting environment variables in a series of different locations, it's because you're not organized - not because environment variables are dumb.

      Delete
    4. The location where the envvars are defined doesn't really matter. The process will get the set of envvars no matter if they're defined in one file or spewed all over the system.

      Delete
  4. arrg, yes. Just today I was using hugo (the static site generator) and I was surprised that in order to deploy to azure/aws I needed to set the container and the key in env variables. And this was not from hugo, but from the go library that was using. What if I want to deploy to multiple containers? Setting envvars for every command in another bash seems hacky rather than storing the conf in a file (like the other hugo settings).

    ReplyDelete
    Replies
    1. You already tried this, right?
      $ docker run -e MYVAR1 --env MYVAR2=foo --env-file ./env.list ubuntu bash

      Delete
  5. So we get rid of PATH? What would you suggest as a replacement, and which works in a similar convenient way?

    ReplyDelete
    Replies
    1. PATH is one of the things that works just fine as an environment variable.

      Delete
    2. PATH comes from the shell that spawns all the child processes. If we need to abandon ideas like PATH, we also needs to abandon ideas like self, as in, how can I be sure that I am the person running my scripts? What if I am actually another human sitting in my chair? I can't prove I am me.

      So it becomes pragmatic that we must have an initial set of things we can rely upon, and PATH variables are a-ok in my book because I can't handle an existential crisis that escapes into my shell configuration.

      Delete
    3. Plan9 largely got rid of PATH with bind mounting everything to /bin. Then shells just had a simple implementation (look for file in /bin) and the filesystem took care of any overlaps.

      You could also hypothetically have a file named "$HOME/.pathrc" that is the list of paths, and in what order, you want all programs that launch programs to use (almost exactly like your .bashrc doing export PATH=blah today).

      There are tons of these types of tradeoffs, and none require environment variables. It is rare that someone uses an environment variable well, especially if it doesn't relate to a user's interactive shell session.

      Delete
  6. "Environment variables is exactly this: mutable global state."

    No they're absolutely not.

    ReplyDelete
    Replies
    1. If one is talking about something like Docker where you run one thing in its own isolated sandbox then they aren't. But on the other hand in the user's login session where you have tens of shells and programs all working together then they are.

      Delete
  7. I completely agree with you. Please ignore the trolls who believe they know everything. Using environment variables for configuration has many downsizes, and unfortunately the situation seems out of control nowadays...

    I wrote about this very thing a few months ago: https://henvic.dev/posts/env/

    ReplyDelete
  8. Global state is unavoidable.
    We just need to make sure those are not mutable.
    I would always prefer big list of env variables in containers.
    ENV variables can be passed as env file or Kubernetes secret in almost all languages and platform. No need to over engineer and call a remote secret while starting containers.
    For reviewing, you can use the .env file or just run set bash command during development.
    $ docker run --env-file ./env.list ubuntu bash

    ReplyDelete
  9. Problem with env is that once a program is running you can’t change envvars to modify the behaviour. That’s is if your program was designed to reload on config change

    ReplyDelete
  10. The title should be "don't sprinkle environment variable reads all over your codebase, and don't invent your own serialization formats", not "don't use environment variables".

    Environment variables are so much better than the alternative. In the past, programs frequently invented their own configuration loading systems, but over the last few years containerization has strongly nudged most programs towards accepting env vars. The result has been a better, more consistent, and less surprising config experience for everyone, even non-container users.

    ReplyDelete
  11. Environment variables are superior to file based configurations in many cases for the exact reason you are recommending against their use.

    - Variations in config and runtime behavior depending on the *environment* - i.e. in local, dev and qa I want to disable sending emails or SMS messages to actual users. Perfect use case for an environment variable, like "OVERRIDE_PHONE_NUMBER=1111111111". This way I can test the system, override the actual phone number that's used, depending on the environment.

    - As has already been mentioned, secrets. Storing these in config files is a bad idea. Vaults etc... may be better in some cases, but then you deal with the complexity of the chain of secrets necessary to access the vault. At some point there's something you need to store in a file or env variable. File is a bad idea.

    - File based configs are fine for certain things, but run a high risk of getting added to SCC. In some cases this is fine, but most folks (me included) will admit to succumbing to the temptation of adding Twilio keys (or similar) to a file config.

    - Anyone doing container based development lean on ENV variables quite a bit. This is a feature. Environment variables enable this.

    - "Never" is a strong term. As you've seen in the comments on this post, there are several good, practical and solid reasons where and when you may want to use environment variables for configuration. It only takes one to ruin your thesis.

    ReplyDelete
  12. Maybe ENVVARS are not great, but JSON config files are not much better... YAML then ?
    What are your thoughts on that ?

    ReplyDelete
    Replies
    1. Well that's a whole different flame war. ;-)

      YAML is nice except for the fact that it allows arbitrary code execution (as per Wikipedia at least)

      Delete