Regex Greedy

I thought .* stopped the next digit in a regex. I added .*? because I ran across not greedy. When does it stop? Why does it stop where it does? I've always used to .* to sweep up characters between expressions.


User uploaded file


which results in:


User uploaded file


You can see I .* is too greedy. .*? does what I want. Why is this so?


R

Mac mini, OS X Yosemite (10.10.5), Fall 2014; iPhone 4 7.1.2

Posted on Apr 18, 2018 1:54 PM

Reply
7 replies

Apr 18, 2018 5:26 PM in response to VikingOSX

I'm still a little confused about this.


(.*?)[[:digit:]][[:digit:]][[:digit:]]*(.*)


Why is [[:digit:]][[:digit:]][[:digit:]] being forced to the right? It seems these are acting greedy. I like think of (.*) as getting the stuff to to the left of [[:digit:]][[:digit:]][[:digit:]]* and the trailing (.*) get the stuff to the right. I think of .* affecting stuff to the left and not affecting stuff to the right of the .*, but I seeing .* effect stuff both to the left and the right. I would have thought the * in [[:digit:]][[:digit:]][[:digit:]]* said to maximize the number of 9s. I guess (.*) takes precedence of the stuff to the right of it.


R

Apr 18, 2018 7:50 PM in response to rccharles

I meant "the bigger picture". Are you trying to rename files or something? You mentioned that you were going to use sed. BBEdit is OK, but sed might behave much differently.


I think your pattern has too many wildcards. What happens if you have two sets of matching digits? I am a bit concerned about the "[[:digit:]]*". Although I do a lot of Perl, I still find regex really tricky.


And your captures at the beginning and end seem superfluous. If you just want to replace any sequence of 2 or more digits with "444", you could just search for "[0-9]{2,}" and replace with "444". There are many ways to do the same thing. There isn't necessarily one right way. But it does seem premature to be going into greedy vs. non-greedy searching for something like this.

Apr 18, 2018 7:49 PM in response to etresoft

Thanks.


Ended up with this:

set toUnix to "echo " & quotedDropped & " | sed s/[[:digit:]][[:digit:]]*/#\\ " & pageCount & "/"
log "toUnix is " & toUnix
set fromUnix to do shell script toUnix
log "sed output is " & fromUnix



(*quotedDropped is 'Macintosh HD:Users:mac:Desktop:Tax - 0002.jpg'*)
(*toUnix is echo 'Macintosh HD:Users:mac:Desktop:Tax - 0002.jpg' | sed s/[[:digit:]][[:digit:]]*/#\ 3/*)
(*sed output is Macintosh HD:Users:mac:Desktop:Tax - # 3.jpg*)

This thread has been closed by the system or the community team. You may vote for any posts you find helpful, or search the Community for additional answers.

Regex Greedy

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple Account.