Processing data with PHP using STDIN and Piping

PHP streams are still lacking in documentation and are rarely used compared to other PHP features. This is a shame because they can be really powerful and I have used them to gain a lot of performance when doing things such as processing log files.

One of the more powerful features of Linux is the ability to pipe in data from another program, it’s often faster to offload tasks to an existing linux user space program than to do it in PHP and the added benefit is that you gain multi core processing which is not possible with standard PHP.

For example, say that you need to read a compressed gzip file and process it line by line in PHP. PHP has functions in it’s standard library to do this, such as gzopen and gzgets, this will work just fine, but the execution of your program will still happen in one process which will cause processing time to grow linearly with the amount of data you need to read.

By using a pipe in Linux, we could use the zcat command and pipe in the output of this command to PHP like so:

By doing this, Linux will uncompress the data in a separate process and if you have a multi core CPU, the two will run in parallel, data will be fed to PHP as soon as it is ready.

You can do even more powerful things using this approach, suppose you needed to filter the compressed data as well, you could use zgrep and filter those lines out before it reaches PHP.

You can even go further by using multi-core gzip tools such as pigz with zgrep to make the uncompressing and filtering truly use multiple cores before it reaches php.

I’m a little off topic here at this point, but the previous explanation was to describe why you would want to do this and examples of where it performs well versus using the built in PHP tools for reading and writing from files.

Let’s get back to PHP.

PHP defines the STDIN constant which is equivalent to

To test it out, make a file called stdin.php with the following:

To test this out, run the following commands on the command line from the same directory:

The script will simply print whatever it receives. Ignore the commented line for now.

You are not limited to using just these type of programs, If you have scripts written in other languages, as long as they output to STDOUT, you can use them along side PHP without changing anything else, the output from those scripts goes directly to PHP.

I’ll assume you had the following simple code in a file called printnumbers.py:

and the following small change to stdin.php:

You can pipe the output from the python script to PHP like so:

This is a simple way to use scripts from multiple languages together, this is great for command line tools because a lot of things you may need are probably already written, just not always in PHP.

The previous examples work fine, however what happens when there is no data sent to standard input? Run the following:

The script will run indefinitely or until preset limits are reached, regardless this is not the intended behaviour, you will need to end the execution manually by running CTRL C.

The reason that the script never completed is that by default stream operations are blocking, meaning that PHP will get to the fgets command and wait indefinitely for data that never arrives.

This would be an issue if you wanted to add some error checking to your application and exit if it’s not used correctly. I thought that the stream_set_timeout function was the answer, however I was never able to get it to work as advertised on PHP STDIN.

If you uncomment the line stream_set_blocking, php will not wait around for a value and continue immediately. However I found that could cause it’s own issues when there was a legitimate delay in new STDIN arriving, fgets would return false and PHP would end the loop.

There is a function stream_set_timeout that I believe is supposed to used in this situation. I came across a bug which was first mentioned several years ago, that bug is here: https://bugs.php.net/bug.php?id=22837, stream_set_timeout is meant to allow you to set a timeout period where a flag will be set when a stream has timed out, allowing you to exit a loop or end your program for example.

In my tests, I was never able to get it work with PHP STDIN. For example, in this test script, the timed_out key is never true:

Perhaps this functionality only works with sockets, regardless, I needed a solution for my own use.

I came up with the following test code that would work as it did before, but exit if no STDIN data arrived after a defined timeout period.

If no STDIN arrives for a period of 3 seconds after the last input, the program will exit. I packaged this code up into a simple re-usable function that is not fully completed, but abstracts some of the functionality away:

The above code works the same as the code before, but packages up a few things. It accepts an array of callbacks to run at certain times, however at the moment it isn’t checking to make sure those callbacks are valid, we could do this with reflection later and it needs some more error checking to make it robust, such as making sure $stream is actually a stream.

If you replace the contents of the earlier example with the above and re-run the examples, you will see the output is the same as before, but when you run the script with no pipe, the program will exit after 3 seconds. This will ensure the script doesn’t run indefinitely and we can exit gracefully.

The above code was tested with PHP 5.4, if there are better ways to handle this situation, please discuss in the comments.

, ,

5 thoughts on “Processing data with PHP using STDIN and Piping

Leave a Reply to JamesD Cancel reply

Your email address will not be published. Required fields are marked *