How to get integer value from a string
I have some strings which contain number and other characters, such as "1 920 pixels". What I want is to get 1920 as integer.
Thanks in advance.
Apple Event: May 7th at 7 am PT
I have some strings which contain number and other characters, such as "1 920 pixels". What I want is to get 1920 as integer.
Thanks in advance.
I couldn't get your dollar version to work, which forced me to dust off my old Sed & Awk book. So I think I've got the whole thing now and much more compact.
Anyone find any flaws in this one?
set s to quoted form of "he has got 3 cats and 1,920 dogs and $20,000"
do shell script "sed s/[a-z]//g <<< " & s
Escape the dollar sign.
[KSH_93u+]:tmp $ sed s/[a-z]//g <<<"he has got 3 cats and 1,920 dogs and \$20,000"
3 1,920 $20,000
If you want to filter a-z and A-Z then
[KSH_93u+]:tmp $ sed 's/[[:alpha:]]//g' <<< "He has got 3 cats and 1,920 dogs and \$20,000"
Escaping the dollar sign isn't allowed (I'm assuming) as that's going to be raw input from the user, and in any case unnecessary as its already taken care of with the AppleScript 'quoted form of' syntax. 😉
You're right I forgot to account for caps, and we all forgot about apostrophes, so it should now be:
set s to quoted form of "He's got 3 cats and 1,920 dogs and $20,000"
do shell script "sed s/[a-zA-Z\\']//g <<< " & s
Result:
" 3 1,920 $20,000"
What about other punctuation that may appear in your string? The extended RE's in grep are better for handling this. I'm guessing that what ever version of grep you and V... are using must be compiled with pcre.
There's no problem with adding other punctuation (?, ! and so on), and I can't see any reason for thinking that grep is preferable here over sed (quite the reverse; as you can see, sed makes much shorter work of it).
I deliberately didn't add . or , because you want to retain the ability to capture floats/decimals as well as comma-separated numbers like "1,920".
Sure, there's other edge cases (maybe we want colon's to capture time statements like 10:19:01), but until the OP is clearer about what exactly he wants there's no point in covering ever possible case. Moreover, any detritus left over in the result can easily be filtered out with AppleScript using text item delimiters or offsets.
Mark and Phil,
OS X 10.9.4. Bash. No pcre. More portable than previous. It handles optional U.S. currency notation with or without trailing spaces before optional punctuated integer or decimal values. The “.," provides support for North American or European currency punctuation. It will handle decimal fractions (e.g. $.45). It will handle sentences such as Phil supplied earlier and extract only the numeric values.
egrep -o "([^[:alpha:]., ]+[$ ]*[[:digit:],.]+|[.,]*[[:digit:]]+)"
Excellent! What if we remove the period if the last "word" of the string is numeric.
egrep -o "([^[:alpha:]., ]+[$ ]*[[:digit:],.]+|[.,]*[[:digit:]]+[^\.$])"
Mark,
Updated to remove trailing period behind last numeric word in sentence.
egrep -o "([^[:alpha:]., ]+[$ ]*[[:digit:],.]+[^[:punct:]\.$]+|[.,]*[[:digit:]]+)"
The RE works fine for a test file where the values are one to a line, or a single sentence with two numbers, the latter ending a sentence with a period. I made a second test file where I joined the previous data to reflect multiple comma or space separated values on individual lines. Here, it gets some things right, and mangles others. I have a headache now, so this is shelved for the time being.
I changed my test criteria to the following data file contents. This egrep RE parses every datum in the test file correctly with one item per output line.
egrep -o "([^[:space:][:alpha:].,]+?[$ ]*[[:digit:],.]+[^[:space:]]+?[^[:punct:][:alpha:][:space:]]+[\.$]?|[.,]*[[:digit:]]+[.]?)" < nbr2.txt
The data file:
The value of 999,999,999,999.45 is not 125000000, or 1,000,000, or 100000, or 100,000.27.
99999 50,137.15 10,000 1,920 100.37 45.45 10.99 1.99
10 1 .1 .0045
"any number you like, (e.g., 1,920) but don't put a $ sign in front of it like $1,920."
$15,000.45 $100000.45 $ 100.25 $ 2,000.56 $ 3000.45 $.045
$ ,045 .045.
Hi VikingOSX,
It's not working correctly with the standard grep in OSX 10.6.
Example:
.
.
100,000.27.
99999 50,137.15
.
.
I'll play with it when I can find some time. This should be moved to its own thread.
@ Phil Stokes
Phil Stokes wrote:
...but until the OP is clearer about what exactly he wants there's no point in covering ever possible case.
I though that this had gone off topic because it didn't fit Michael's requirement. Anyway, from the shell's perspective this would meet his requirement as stated.
sed 's/[^0-9]//g' <<<"1 920 pixels"
Well, you still have to turn it into an integer.
Mark,
Probably differences between the GNU grep on Snow Leopard, and the BSD grep on Mavericks. I deliberately captured the trailing period in 100,000.27. and likely should not. The 99999 50,137.15 sequence gave me fits before I finally dealt with that single space.
/VikingOSX
Mark said:
Well, you still have to turn it into an integer.
Good point. Had forgotten that in all the fun...
This should do it, I think:
setstoquoted formof "He's got 3 cats and 1,920 dogs and $20,000"
do shell script "sed s/[a-zA-Z\\']//g <<< " & s
set dx to the result
set numlist to {}
repeat with i from 1 to count of words in dx
set this_item to word i of dx
try
set this_item to this_item as number
set the end of numlist to this_item
end try
end repeat
numlist
V.........,
Well, you made short work of it. A minor tweak for gnu grep->
egrep -o "([^[:space:][:alpha:].,]+?[*$*]*[[:digit:],.]+[^[:space:]]+?[^[:punct:][:alpha:][:space:]]+[\.$]?|[.,]*[[:digit:]]+[.]?)"
Good work ( let the trailing period be ).
set s to quoted form of "He's got 3 cats and 1,920 dogs and $20,000"
do shell script "sed s/[a-zA-Z\\']//g <<< " & s
set dx to the result
set numlist to {}
repeat with i from 1 to count of words in dx
set this_item to word i of dx
try
set this_item to this_item as number
set the end of numlist to this_item
end try
end repeat
numlist
Brilliant ! That's exactly what I need!
@all
Many thanks to all of you!
How to get integer value from a string