14 11 2013
Processing data with PHP using STDIN and Piping
PHP streams are still lacking in documentation and are rarely used compared to other PHP features. This is a shame because they can be really powerful and I have used them to gain a lot of performance when doing things such as processing log files.
One of the more powerful features of Linux is the ability to pipe in data from another program, it’s often faster to offload tasks to an existing linux user space program than to do it in PHP and the added benefit is that you gain multi core processing which is not possible with standard PHP.
For example, say that you need to read a compressed gzip file and process it line by line in PHP. PHP has functions in it’s standard library to do this, such as gzopen and gzgets, this will work just fine, but the execution of your program will still happen in one process which will cause processing time to grow linearly with the amount of data you need to read.
By using a pipe in Linux, we could use the zcat command and pipe in the output of this command to PHP like so:
1 |
zcat mylog.gz | php process_log.php |
By doing this, Linux will uncompress the data in a separate process and if you have a multi core CPU, the two will run in parallel, data will be fed to PHP as soon as it is ready.
You can do even more powerful things using this approach, suppose you needed to filter the compressed data as well, you could use zgrep and filter those lines out before it reaches PHP.
1 |
zgrep -v 'excludeme' mylog.gz | php test.php |
You can even go further by using multi-core gzip tools such as pigz with zgrep to make the uncompressing and filtering truly use multiple cores before it reaches php.
I’m a little off topic here at this point, but the previous explanation was to describe why you would want to do this and examples of where it performs well versus using the built in PHP tools for reading and writing from files.
Let’s get back to PHP.
PHP defines the STDIN constant which is equivalent to
1 |
fopen("php://stdin", "r"); |
To test it out, make a file called stdin.php with the following:
1 2 3 4 5 6 7 |
<?php //stream_set_blocking(STDIN, 0); while (false !== ($line = fgets(STDIN))) { echo $line; } |
To test this out, run the following commands on the command line from the same directory:
1 |
dmesg | php stdin.php |
1 |
uptime | php stdin.php |
1 |
zcat /var/log/apache2/access.log.1.gz | php stdin.php |
The script will simply print whatever it receives. Ignore the commented line for now.
You are not limited to using just these type of programs, If you have scripts written in other languages, as long as they output to STDOUT, you can use them along side PHP without changing anything else, the output from those scripts goes directly to PHP.
I’ll assume you had the following simple code in a file called printnumbers.py:
1 2 |
for i in range(1, 10): print i |
and the following small change to stdin.php:
1 2 3 4 5 6 7 |
<?php echo "in php script\n"; while (false !== ($line = fgets(STDIN))) { echo $line; } |
You can pipe the output from the python script to PHP like so:
1 |
python printnumbers.py | php stdin.php |
This is a simple way to use scripts from multiple languages together, this is great for command line tools because a lot of things you may need are probably already written, just not always in PHP.
The previous examples work fine, however what happens when there is no data sent to standard input? Run the following:
1 |
php stdin.php |
The script will run indefinitely or until preset limits are reached, regardless this is not the intended behaviour, you will need to end the execution manually by running CTRL C.
The reason that the script never completed is that by default stream operations are blocking, meaning that PHP will get to the fgets command and wait indefinitely for data that never arrives.
This would be an issue if you wanted to add some error checking to your application and exit if it’s not used correctly. I thought that the stream_set_timeout function was the answer, however I was never able to get it to work as advertised on PHP STDIN.
If you uncomment the line stream_set_blocking, php will not wait around for a value and continue immediately. However I found that could cause it’s own issues when there was a legitimate delay in new STDIN arriving, fgets would return false and PHP would end the loop.
There is a function stream_set_timeout that I believe is supposed to used in this situation. I came across a bug which was first mentioned several years ago, that bug is here: https://bugs.php.net/bug.php?id=22837, stream_set_timeout is meant to allow you to set a timeout period where a flag will be set when a stream has timed out, allowing you to exit a loop or end your program for example.
In my tests, I was never able to get it work with PHP STDIN. For example, in this test script, the timed_out key is never true:
1 2 3 4 5 6 7 8 9 |
<?php stream_set_blocking(STDIN, 0); stream_set_timeout(STDIN, 1); while (1) { $info = stream_get_meta_data(STDIN); var_dump($info['timed_out']); } |
Perhaps this functionality only works with sockets, regardless, I needed a solution for my own use.
I came up with the following test code that would work as it did before, but exit if no STDIN data arrived after a defined timeout period.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
<?php stream_set_blocking(STDIN, 0); const TIMEOUT_SECONDS = 3; $timeoutStarted = false; $timeout = null; while (1) { while (false !== ($line = fgets(STDIN))) { echo $line; if ($timeoutStarted) { $timeoutStarted = false; $timeout = null; } } if (feof(STDIN)) { echo "feof\n"; break; } if (null === $timeout) { $timeout = time(); $timeoutStarted = true; continue; } if (time() > $timeout + TIMEOUT_SECONDS) { echo "timeout\n"; break; } }; echo "done\n"; |
If no STDIN arrives for a period of 3 seconds after the last input, the program will exit. I packaged this code up into a simple re-usable function that is not fully completed, but abstracts some of the functionality away:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
<?php /** * @param $stream * @param int $timeoutSeconds * @param array $callbacks */ function non_block_process(&$stream, array $callbacks = [], $timeoutSeconds = 3) { stream_set_blocking($stream, 0); $defaultCallbacks['line'] = function(&$line) {}; $defaultCallbacks['timeout'] = $defaultCallbacks['feof'] = function() {}; $callbacks = $callbacks + $defaultCallbacks; $timeoutStarted = false; $timeout = null; while (1) { while (false !== ($line = fgets($stream))) { $callbacks['line']($line); if ($timeoutStarted) { $timeoutStarted = false; $timeout = null; } } if (feof($stream)) { $callbacks['feof'](); break; } if (null === $timeout) { $timeout = time(); $timeoutStarted = true; continue; } if (time() > $timeout + $timeoutSeconds) { $callbacks['timeout'](); break; } }; } $stdin = STDIN; $callbacks = [ 'line' => function(&$line) { echo $line; }, 'feof' => function() { echo "feof\n"; }, 'timeout' => function() { echo "timeout\n"; }, ]; non_block_process($stdin, $callbacks); |
The above code works the same as the code before, but packages up a few things. It accepts an array of callbacks to run at certain times, however at the moment it isn’t checking to make sure those callbacks are valid, we could do this with reflection later and it needs some more error checking to make it robust, such as making sure $stream is actually a stream.
If you replace the contents of the earlier example with the above and re-run the examples, you will see the output is the same as before, but when you run the script with no pipe, the program will exit after 3 seconds. This will ensure the script doesn’t run indefinitely and we can exit gracefully.
The above code was tested with PHP 5.4, if there are better ways to handle this situation, please discuss in the comments.
Your PHP Framework Choice doesn’t Matter Learning PHP 7 – Setting up a PHP 7 development environment with Xdebug
You can detect if STDIN is interactive by using the posix function posix_iastty()
http://docs.php.net/manual/en/function.posix-isatty.php
Thanks for the tip, I also found this post: http://stackoverflow.com/questions/11327367/detect-if-a-php-script-is-being-run-interactively-or-not which accounts for more edge cases depending on the script.
[…] Freeman has a post today looking at using streams and STDIN in PHP to handling incoming data (like to a CLI […]
Very few people seem to check the return value from stream_set_timeout() which is always worth doing.
I have tried using proc_open() and then reading from the pipe to get the output of the other process, but stream_set_timeout() does not work when applied to that pipe (and indeed it correctly returns FALSE to indicate that it is not going to work).
I’m testing with PHP 5.4.16 and CentOS linux… I ended up using stream_set_blocking( $pipe_handle, false ) and then jiggering around with loops and sleeps, a bit like you did, not quite what I would have liked.
<?php
if you want to use timeout you must set blocking yes otherwise it will not wait.
tested on php 5.6 and centos
stream_set_blocking(STDIN, 1);
stream_set_timeout(STDIN, 1);
while (1) {
$info = stream_get_meta_data(STDIN);
var_dump($info['timed_out']);
}