#include<stdio.h>
int main(int argc, char **argv) {
printf("Hello, world.\n");
return 0;
}
The only (direct) information passed to the program is an array of strings containing its (command line) arguments. Thus it seems like an obvious conclusion that there is a corresponding function that takes an executable to run and an array of strings with the arguments. This turns out to be the case, and it is what the exec family of functions do. An example would be execve.
This function only exists on posixy operating systems, it is not available on Windows. The native way to start processes on Windows is the CreateProcess function. It does not take an array of strings, instead it takes a string:
BOOL CreateProcessA(
LPCSTR lpApplicationName,
LPSTR lpCommandLine,
...
The operating system then internally splits the string into individual components using an algorithm that is not at all simple or understandable and whose details most people don't even know.
Side note: why does Windows behave this way?
I don't know for sure. But we can formulate a reasonable theory by looking in the past. Before Windows existed there was DOS, and it also had a way of invoking processes. This was done by using interrupts, in this case function 4bh in interrupt 21h. Browsing through online documentation we can find a relevant snippet:
Action: Loads a program for execution under the control of an existing program. By means of altering the INT 22h to 24h vectors, the calling prograrn [sic] can ensure that, on termination of the called program, control returns to itself.
On entry: AH = 4Bh
AL = 0: Load and execute a program
AL = 3: Load an overlay
DS.DX = segment:offset of the ASCIIZ pathname
ES:BX = Segment:offset of the parameter block
Parameter block bytes:
0-1: Segment pointer to envimmnemnt [sic] block
2-3: Offset of command tail
4-5: Segment of command tail
Here we see that the command is split in the same way as in the corresponding Win32 API call, into ta command to execute and a single string that contains the arguments (the command tail, though some sources say that this should be the full command line). This interrupt handler could take an array of strings instead, but does not. Most likely this is because it was the easiest thing to implement in real mode x86 assembly.
When Windows 1.0 appeared, its coders probably either used the DOS calls directly or copied the same code inside Windows' code base for simplicity and backwards compatibility. When the Win32 API was created they probably did the exact same thing. After all, you need the single string version for backwards compatibility anyway, so just copying the old behaviour is the fast and simple thing to do.
Why is this behaviour bad?
There are two main use cases for invoking processes: human invocations and programmatic invocations. The former happens when human beings type shell commands and pipelines interactively. The latter happens when programs invoke other programs. For the former case a string is the natural representation for the command, but this is not the case for the latter. The native representation there is an array of strings, especially for cross platform code because string splitting rules are different on different platforms. Implementing shell-based process invocation on top of an interface that takes an array of strings is straightforward, but the opposite is not.
Often command lines are not invoked directly but are instead passed from one program to another, stored to files, passed over networks and so on. It is not uncommon to pass a full command line as a command line argument to a different "wrapper" command and so on. An array of string is trivial to pass through arbitrarily deep and nested scenarios without data loss. Plain strings not so much. Many, many, many programs do command string splitting completely wrong. They might split it on spaces because it worksforme on this machine and implementing a full string splitter is a lot of work (thousands of lines of very tricky C at the very least). Some programs don't quote their outputs properly. Some don't unquote their inputs properly. Some do quoting unreliably. Sometimes you need to know in advance how many layers of unquoting your string will go through in advance so you can prequote it sufficiently beforehand (because you can't fix any of the intermediate blobs). Basically every time you pass commands as strings between systems, you get a parsing/quoting problem and a possibility for shell code injection. At the very least the string should carry with it information on whether it is a unix shell command line or a cmd.exe command line. But it doesn't, and can't.
Because of this almost all applications that deal with command invocation kick the can down the road and use strings rather than arrays, even though the latter is the "correct" solution. For example this is what the Ninja build system does. If you go through the rationale for this it is actually understandable and makes sense. The sad downside is that everyone using Ninja (or any such tool) has to do command quoting and parsing manually and then ninja-quote their quoted command lines.
This is the crux of the problem. Because process invocation is broken on Windows, every single program that deals with cross platform command invocation has to deal with commands as strings rather than an array of strings. This leads to every program using commands as strings, because that is the easy and compatible thing to do (not to mention it gives you the opportunity to close bugs with "your quoting is wrong, wontfix"). This leads to a weird kind of quantum entanglement where having things broken on one platform breaks things on a completely unrelated platform.
Can this be fixed?
Conceptually the fix is simple: add a new function, say, CreateProcessCmdArray to Win32 API. It is identical to plain CreateProcess except that it takes an array of strings rather than a shell command string. The latter can be implemented by running Windows' internal string splitter algorithm and calling the former. Seems doable, and with perfect backwards compatibility even? Sadly, there is a hitch.
It has been brought to my attention via unofficial channels [1] that this will never happen. The people at Microsoft who manage the Win32 API have decreed this part of the API frozen. No new functionality will ever be added to it. The future of Windows is WinRT or UWP or whatever it is called this week.
UWP is conceptually similar to Apple's iOS application bundles. There is only one process which is fully isolated from the rest of the system. Any functionality that need process isolation (and not just threads) must be put in its own "service" that the app can then communicate with using RPC. This turned out to be a stupid limitation for a desktop OS with hundreds of thousands of preexisting apps, because it would require every Win32 app using multiple processes to be rewritten to fit this new model. Eventually Microsoft caved under app vendor pressure and added the functionality to invoke processes into UWP (with limitations though). At this point they had a chance to do a proper from-scratch redesign for process invocation with the full wealth of knowledge we have obtained since the original design was written around 1982 or so. So can you guess whether they:
- Created a proper process invocation function that takes an array of strings?
- Exposed CreateProcess unaltered to UWP apps?
Bonus chapter: msvcrt's execve functions
Some of you might have thought waitaminute, the Visual Studio C runtime does ship with functions that take string arrays so this entire blog post is pointless whining. This is true, it does provide said functions. Here is a pseudo-Python implementation for one of them. It is left as an exercise to the reader to determine why it does not help with this particular problem:
def spawn(cmd_array):
cmd_string = ' '.join(cmd_array)
CreateProcess(..., cmd_string, ...)
[1] That is to say, everything from here on may be completely wrong. Caveat lector. Do not quote me on this.