Wikipedia:Reference desk/Archives/Computing/2015 November 5

Computing desk
< November 4 << Oct | November | Dec >> November 6 >
Welcome to the Wikipedia Computing Reference Desk Archives
The page you are currently viewing is an archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


November 5

edit

grep for shell glob patterns?

edit

So we've got grep, egrep, and fgrep that accept different variations on regular expressions, and in fact these days they're all the same program, with (if you wish) -E, -F, and -G options to select the regexp style.

And as we all know, a regular expression is not the same thing as a shell glob pattern.

So my question is, has anyone ever come across a shell-callable grep variant that matches glob patterns? Or am I going to have to write my own? (For pedantry's sake I may have to call it ggpp.) —Steve Summit (talk) 02:13, 5 November 2015 (UTC)[reply]

I suppose it's possible someone has written one, but it's not widely distributed, probably because if you have a shell, you can just do the searching with the shell, using a while-read loop. --71.119.131.184 (talk) 03:00, 5 November 2015 (UTC)[reply]
Well, the point is that the shell generally wants to expand glob patterns to lists of filenames, and that doesn't help if you want to apply a glob pattern to something else. So the issue mostly isn't how you iterate, but how you test. However, there is at least one exception to that "generally", and that is case statements. So consider this code:

while read line
do
case "$line" in
$pat) printf "%s\n" "$line";;
esac
done

Unfortunately, this fails if the input lines start or end with whitespace, because read strips it, at least in both the sh and bash implementations I have access to. I guess you could do this:

while read line
do
case "/$line/" in # Note: / is just an arbitrary character here, not a regexp delimiter
/$pat/) printf "%s\n" "$line";;
esac
done

But maybe there is some newfangled easier way that I don't know. Hey, I hadn't even heard of printf as a shell command until I started writing this. --70.49.170.168 (talk) 07:36, 5 November 2015 (UTC)[reply]
Oops, never mind that last part; it attempts to apply protection too late, after read has already stripped the whitespace. --70.49.170.168 (talk) 07:38, 5 November 2015 (UTC)[reply]
Use while IFS= read -r line. This stops read from modifying the input. You should always use read like this for reading in whole lines, unless you know you want other behavior. See my link to the BashFAQ above for more details. You may also want to look at the bash manual and the POSIX standard for more information on how read works. (read is a shell builtin, so man read won't give you anything, which is something that often trips people up.) --71.119.131.184 (talk) 08:22, 5 November 2015 (UTC)[reply]
Thanks for the IFS= trick. I've used IFS to modify the behavior of read before, but never thought of setting it to a null string. --70.49.170.168 (talk) 21:18, 5 November 2015 (UTC)[reply]
After sorting out the whitespace, you can use just [[ $line == $pat ]] in bash. --Tardis (talk) 13:43, 5 November 2015 (UTC)[reply]
You can? <tries it> I'm stunned. Whouda thunk? I won't be using that, but thanks for providing a new data point for the old adage that you learn something new every day. —Steve Summit (talk) 15:14, 5 November 2015 (UTC)[reply]
Also, to be pedantic, grep -F/fgrep doesn't use regexes at all. It searches for "fixed" strings, hence the name. As our grep article kind of tells you, it dates back to V7 Unix, when computers were much much slower. Having a program that didn't do any regex interpretation made sense, for when you knew you just wanted to do string searches. --71.119.131.184 (talk) 03:12, 5 November 2015 (UTC)[reply]

Thanks, guys. In this case the patterns won't be known 'til run time (so the case trick won't work without a whole lot of hacking), and the script is bordering on the slow side already, employing lots of shelly machinations for each directory it processes, so once I'm down to iterating over the filenames in a directory, I'd like to do so with a one-pass external program, not more shell loops.

I do have my own glob-matching code lying around somewhere (and even my own version of grep, for that matter), so integrating the two won't be much trouble. (Heck, I might even integrate into GNU grep, and contribute my mods back to the FSF.)

To be even more pedantic, even back in the days when Unix was young and computers were "slow", the word on the street was that egrep was faster than fgrep. fgrep is useful when I do want a fixed-string match with no special characters interpreted, but if I care about speed I always use egrep. —Steve Summit (talk) 14:51, 5 November 2015 (UTC)[reply]

It's an interpreter! The case trick does not require the pattern to be known in advance. Note that I wrote $pat in there. And it works in sh, so you don't need to bring bash into it. --70.49.170.168 (talk) 00:52, 6 November 2015 (UTC)[reply]

ixquick and startpage.com, business model?

edit

ixquick and startpage.com offer websearch for free, by using a proxy that anonymizes Google results (startpage.com) or by aggregating the results from several search engines (ixquick). They have been awarded for not tracking their users. And there are no ads. So how do they earn money? --NorwegianBlue talk 08:11, 5 November 2015 (UTC)[reply]

This company is privately owned - and they even proudly acknowledge that they "do not report financial information." They have no legal obligation to explain how they are profitable; they probably have no legal liability if they lie about being profitable. (It's the same reason that you don't have to publish your paycheck stubs on your Wikipedia user-page - because you don't have to!). The company webpage claims that they are profitable and that some of their revenue comes from sponsored search results. Along the same lines, you could claim that you're making tons of money, and nobody could prove it one way or another using public information. You're allowed to tell your friends that you're a super-rich, super-profitable millionaire, in a feeble and superficial effort to impress them with material wealth - and this is perfectly fine and legal, whether it is true or false - unless you're lying to them so you can raise regulated securities. Privately-owned corporations generally have the very same freedom: they don't have to "post their pay stubs" to back up their statements. For all we know, every word of it is true!
You could mail the Ixquick press contacts to see if they have any statement or investor-information that is not hosted on their website.
Nimur (talk) 09:23, 5 November 2015 (UTC)[reply]
Thanks! NorwegianBlue talk 15:44, 5 November 2015 (UTC)[reply]
They do show ads. Try searching for something buyable, like "toaster". This page says "We earn 99 procent [sic] of the money from the ads we show on our results pages." Since they're just an anonymizing proxy for other companies' services, their operating costs are probably low, so there's no reason they couldn't be profitable. -- BenRG (talk) 20:31, 5 November 2015 (UTC)[reply]