As we all know Linux applications are tied to the distro they have been built on. The do not work on other distros and even on later releases of the same distro. Anyone who has tried to run old binaries knows this. It is the commonly accepted truth.
But is it the actual truth?
In preparation for my LCA2016 presentation I decided to test this. I took Debian Lenny from 2009 and installed it in VirtualBox. Lenny is the oldest version that would work, the older ones would fail because distro mirrors no longer carry their packages. Then I downloaded GTK+ 1.2 from the previous millennium (1999). I built and installed it and GLib 1 into a standalone directory and finally compiled a helloworld application and a launcher script and put them in the same dir.
This directory formed an application bundle that is almost identical to OSX app bundles. The main difference to a distro package is that it embeds all its dependencies rather than resorting to distro packages. This is exactly how one would have produced a standalone binary at the time. I copied the package to a brand new Fedora install and launched it. This is the result.
This is a standard Fedora install running Gnome 3. The gray box in the middle is the GTK+ 1.2 application. It ran with no changes whatsoever. This is even more amazing when you consider that this means a backwards compatibility time span of over 15 years, over two completely different Linux distributions and even CPU architectures (Lenny was x86 whereas Fedora is x86_64).
A gathering of development thoughts of Jussi Pakkanen. Some of you may know him as the creator of the Meson build system.
Thursday, September 24, 2015
Tuesday, August 18, 2015
Proposal for a dependency security scheme for sandboxed apps
Perhaps the main disadvantage of embedding external projects as source code rather than using distro packages is the loss of security. Linux distributions have lots of people working on keeping distro packages up to date and safe. With embedded source code this is no longer the case. The developer is in charge of keeping the application up to date. Some developers are better at it than others.
What makes this worse is the fact that if you embed (and especially statically link) your dependencies it is impossible to know what versions of which libraries you are using. If this information were available, then the host operating system could verify the list of embedded dependencies against a known white- or blacklist. The packaging format simply does not have this information.
So let's put it there.
A merge proposal has just recently been proposed to Meson. This makes it create (and optionally install) a dependency manifest for each generated binary. This manifest is simply a JSON file that lists all the embedded dependencies that each given binary uses. Its proposed format looks like this.
{
"type": "dependency manifest",
"version": 1,
"dependencies":
{
"entity": "1.0"
}
}
In this case the executable has only one dependency, the project entity version 1.0. Other such dependencies could include zlib version 1.2.8 or openssl version 1.0.2d. The project names and releases would mirror upstream releases. This manifest would make it easy to guard against binaries that have embedded unsafe versions of their dependencies.
But wait, it gets better.
All dependencies that are provided by the Wrap database would (eventually ;) expose this information and thus would generate the manifest automatically. The developer does not need to do anything to get it built, only to say he wants to have it installed. It is simply a byproduct of using the dependency.
As the Linux application installation experience keeps moving away from distro packages and towards things such as xdg-app, snappy and the like, the need to increase security becomes ever trickier. This is one such proposal that is already working and testable today. Hopefully it will see adoption among the community.
Single question mini-faq
What happens if someone creates a fraudulent or empty manifest file for their app?
The exact same thing as now when there is no manifest info of any kind. This scheme is not meant to protect against Macchiavelli, only against Murphy.
Wednesday, August 5, 2015
Make your tool programs take file name arguments
There are a lot of utility programs for text manipulation, scanning and so on. Often they are written as filters so you can use them with shell redirection like this.
dosomething < infile.txt > outfile.txt
There's nothing wrong with this as such, but the problem is that this causes problems when you invoke the program from somewhere else than a unix shell prompt. As an example if you need to invoke it from a compiled binary, things get very complex as you need to juggle with pipes and other stuff.
Because of this you should always make it possible to specify inputs and outputs as plain files. For new programs this is simple. This is a bigger problem for existing programs that already have hundreds of lines of code that read and write directly to stdout and stdin. Refactoring it to use files might be a lot of work for little visible gain.
No matter, the C standard library has you covered. It has a method called freopen that opens a file and replaces an existing file descriptor with it. To forward stdout and stdin to files you just need to do this at the beginning of your program:
freopen(ifilename, "r", stdin);
freopen(ofilename, "w", stdout);
dosomething < infile.txt > outfile.txt
There's nothing wrong with this as such, but the problem is that this causes problems when you invoke the program from somewhere else than a unix shell prompt. As an example if you need to invoke it from a compiled binary, things get very complex as you need to juggle with pipes and other stuff.
Because of this you should always make it possible to specify inputs and outputs as plain files. For new programs this is simple. This is a bigger problem for existing programs that already have hundreds of lines of code that read and write directly to stdout and stdin. Refactoring it to use files might be a lot of work for little visible gain.
No matter, the C standard library has you covered. It has a method called freopen that opens a file and replaces an existing file descriptor with it. To forward stdout and stdin to files you just need to do this at the beginning of your program:
freopen(ifilename, "r", stdin);
freopen(ofilename, "w", stdout);
Sunday, August 2, 2015
Not handling filenames with spaces should be classified a fatal bug
A lot of programs (even commonly used ones) fail miserably if you try to give them file names with spaces in them. The most common way to fail is to pass filenames to the shell unquoted. When you try to make people fix these issues you usually get the same response:
Not fixing, just rename all your files and dirs to not have spaces in them.
This is both understandable and totally misguided. The point of failing on spaces in filenames is not about those files. It's about not properly sanitizing your input and output. To see why this would be a problem, just imagine what would happen if you passed the following as a filename to a program that will use it in a shell command invocation:
; rm -rf /;
Because of this every case of an application failing with spaces in file names should be classified to the same severity level as an SQL injection vulnerability.
Not fixing, just rename all your files and dirs to not have spaces in them.
This is both understandable and totally misguided. The point of failing on spaces in filenames is not about those files. It's about not properly sanitizing your input and output. To see why this would be a problem, just imagine what would happen if you passed the following as a filename to a program that will use it in a shell command invocation:
; rm -rf /;
Because of this every case of an application failing with spaces in file names should be classified to the same severity level as an SQL injection vulnerability.
Saturday, July 25, 2015
Running Linux installs with a different CPU
Every now and then you might need a Linux distribution for a CPU that you do not have. As an example you might want to run an ARM distro on an x86_64 machine. This is not only possible but quite simple. Here's how you would do it on Ubuntu or Debian.
First install the qemu-user-static package and create a subdirectory to hold your distro install. Then run the first stage install. Using Debian sid as an example:
sudo debootstrap --arch=armhf --foreign sid sid-arm
Once that has finished, copy the qemu binary inside this install. This makes it possible to run ARM binaries transparently on an Intel CPU.
sudo cp /usr/bin/qemu-arm-static sid-arm/usr/bin
Then finish the bootstrap operation.
sudo chroot sid-arm /debootstrap/debootstrap --second-stage
And now we are done. Chrooting in sid-arm and running uname -a prints this:
Linux userland 4.1.0-2-generic #2-Ubuntu SMP Wed Jul 22 18:19:08 UTC 2015 armv7l GNU/Linux
At this point you can use the chroot as if it were a native installation. You can install packages with apt, compile stuff with gcc and do anything else you might prefer. In theory you could even run it as a container with systemd-nspawn, but unfortunately qemu is missing some functionality to do that. The bug report is here. The only downside of this setup is that because the CPU is emulated, all binaries run a bit slow.
First install the qemu-user-static package and create a subdirectory to hold your distro install. Then run the first stage install. Using Debian sid as an example:
sudo debootstrap --arch=armhf --foreign sid sid-arm
Once that has finished, copy the qemu binary inside this install. This makes it possible to run ARM binaries transparently on an Intel CPU.
sudo cp /usr/bin/qemu-arm-static sid-arm/usr/bin
Then finish the bootstrap operation.
sudo chroot sid-arm /debootstrap/debootstrap --second-stage
And now we are done. Chrooting in sid-arm and running uname -a prints this:
Linux userland 4.1.0-2-generic #2-Ubuntu SMP Wed Jul 22 18:19:08 UTC 2015 armv7l GNU/Linux
At this point you can use the chroot as if it were a native installation. You can install packages with apt, compile stuff with gcc and do anything else you might prefer. In theory you could even run it as a container with systemd-nspawn, but unfortunately qemu is missing some functionality to do that. The bug report is here. The only downside of this setup is that because the CPU is emulated, all binaries run a bit slow.
Tuesday, July 21, 2015
List of achievements for an open source project
- [ ] first commit
- [ ] making the code public
- [ ] first checkout by someone else
- [ ] first user
- [ ] first bug report
- [ ] first mention in a blog post by someone you don't know
- [ ] first submitted patch
- [ ] approval to Debian/Fedora
- [ ] a magazine article mentioning your project
- [ ] a magazine article about your project
- [ ] first conference presentation
- [ ] a project subreddit is created
- [ ] a conference presentation about your project held by someone you don't know
- [ ] a bug report/patch from Linus Torvalds/someone else well known
- [ ] RMS claims your project is anti-freedom
- [ ] the project gets forked
- [ ] an O'Reilly animal book gets published
- [ ] a conference wholly about your project
- [ ] someone reimplements your project in a new language in 1/10th of the time it took to write the original one
- [ ] making the code public
- [ ] first checkout by someone else
- [ ] first user
- [ ] first bug report
- [ ] first mention in a blog post by someone you don't know
- [ ] first submitted patch
- [ ] approval to Debian/Fedora
- [ ] a magazine article mentioning your project
- [ ] a magazine article about your project
- [ ] first conference presentation
- [ ] a project subreddit is created
- [ ] a conference presentation about your project held by someone you don't know
- [ ] a bug report/patch from Linus Torvalds/someone else well known
- [ ] RMS claims your project is anti-freedom
- [ ] the project gets forked
- [ ] an O'Reilly animal book gets published
- [ ] a conference wholly about your project
- [ ] someone reimplements your project in a new language in 1/10th of the time it took to write the original one
Wednesday, July 15, 2015
Strange requirements of low level packages
As some of you might know, I'm the main developer of the Meson build system. It is implemented in Python 3 but has strict requirements on dependencies. Every now and then this confuses people so here's an explanation why we do things slightly unconventionally.
One of the greatest things about modern development environments is the number of third party packages and how easy they are to use. This is why Meson has spent a lot of effort in creating a simple way to get dependencies automatically on multiple platforms. This system is called Wrap, but this text is not about that. Instead it's about the lack of dependencies in Meson itself. The policy is that in order to run Meson and its test suite, you are only allowed to depend on Python standard library. No external dependencies are allowed. For non-core stuff (coverage, static analysis) external dependencies are ok.
This seems a bit strange and has caused us to have to reinvent some wheels. There are valid reasons for this, though, which have to do with the fact that packages at the lowest levels of the software stack have requirements that regular applications don't. The most important of these is that a build system needs to work in very limited distros and environments.
Let's look at Mer project for an example. It aims to provide an extremely slimmed down Linux distribution for use in mobile applications. They support only a small number of packages by design. There are only 8 Python packages in total. Six are extension packages and those are not useful for a build system. If we wanted to get Meson there, what would it actually mean?
If Meson depends on any other package, it can not be accepted into Mer. Since source bundling is prohibited and the system does not allow net access during package creation, it just can't be done. Either you disable the part that requires the extension or you don't get your package accepted. You might get an exception but you need to apply for it and it takes time an effort for many people. If only the unit test system depends on external packages you can disable it but then you can't tell if your project is working correctly. This is not acceptable either.
There are a bunch of distros like this. Then there are the BSDs that might not have your dependencies in their repos. Rather than force them all to deal with this dependency issue we instead write our code so that it has no external dependencies. This means more work for us but less for everyone else. Luckily Python 3's standard library is very good and has almost everything we need.
Another use case that many people don't think about is that a lot of development is done on machines that neither have Internet access nor the capability to install additional software. Several corporations have requirements such as these for real or imagined security reasons. Even some OSS development is done like this. Requesting permission to use a new dependency can take six weeks.
I'm not saying that this is right or wrong or even sane. What I'm saying is that this is reality and that you ignore reality at your own peril.
In an environment like this getting Meson approved for use takes only one round of bureaucracy. Every dependency requires an additional round (if you are lucky). To reduce friction and ease uptake, we must again cut down on our dependencies. The easier Meson is to deploy, the more people are willing to switch to it.
There are other advantages, too. It is easier to get contributions from Windows and OSX developers since they can hack on the code without any additional steps. You'd be surprised how tough it can be to download Python dependencies, even with setuptools. Deployment is easier, too: "Download Python 3 and Meson, run" as opposed to "download Python 3 and Meson, use setuptool to move it places, then install these dependencies and then set up these paths and ...".
Because of these reasons we don't use external dependencies. This is the price you pay for being at the very bottom of the software stack.
One of the greatest things about modern development environments is the number of third party packages and how easy they are to use. This is why Meson has spent a lot of effort in creating a simple way to get dependencies automatically on multiple platforms. This system is called Wrap, but this text is not about that. Instead it's about the lack of dependencies in Meson itself. The policy is that in order to run Meson and its test suite, you are only allowed to depend on Python standard library. No external dependencies are allowed. For non-core stuff (coverage, static analysis) external dependencies are ok.
This seems a bit strange and has caused us to have to reinvent some wheels. There are valid reasons for this, though, which have to do with the fact that packages at the lowest levels of the software stack have requirements that regular applications don't. The most important of these is that a build system needs to work in very limited distros and environments.
Let's look at Mer project for an example. It aims to provide an extremely slimmed down Linux distribution for use in mobile applications. They support only a small number of packages by design. There are only 8 Python packages in total. Six are extension packages and those are not useful for a build system. If we wanted to get Meson there, what would it actually mean?
If Meson depends on any other package, it can not be accepted into Mer. Since source bundling is prohibited and the system does not allow net access during package creation, it just can't be done. Either you disable the part that requires the extension or you don't get your package accepted. You might get an exception but you need to apply for it and it takes time an effort for many people. If only the unit test system depends on external packages you can disable it but then you can't tell if your project is working correctly. This is not acceptable either.
There are a bunch of distros like this. Then there are the BSDs that might not have your dependencies in their repos. Rather than force them all to deal with this dependency issue we instead write our code so that it has no external dependencies. This means more work for us but less for everyone else. Luckily Python 3's standard library is very good and has almost everything we need.
Another use case that many people don't think about is that a lot of development is done on machines that neither have Internet access nor the capability to install additional software. Several corporations have requirements such as these for real or imagined security reasons. Even some OSS development is done like this. Requesting permission to use a new dependency can take six weeks.
I'm not saying that this is right or wrong or even sane. What I'm saying is that this is reality and that you ignore reality at your own peril.
In an environment like this getting Meson approved for use takes only one round of bureaucracy. Every dependency requires an additional round (if you are lucky). To reduce friction and ease uptake, we must again cut down on our dependencies. The easier Meson is to deploy, the more people are willing to switch to it.
There are other advantages, too. It is easier to get contributions from Windows and OSX developers since they can hack on the code without any additional steps. You'd be surprised how tough it can be to download Python dependencies, even with setuptools. Deployment is easier, too: "Download Python 3 and Meson, run" as opposed to "download Python 3 and Meson, use setuptool to move it places, then install these dependencies and then set up these paths and ...".
Because of these reasons we don't use external dependencies. This is the price you pay for being at the very bottom of the software stack.
Subscribe to:
Posts (Atom)
