How to get integer value from a string
I have some strings which contain number and other characters, such as "1 920 pixels". What I want is to get 1920 as integer.
Thanks in advance.
I have some strings which contain number and other characters, such as "1 920 pixels". What I want is to get 1920 as integer.
Thanks in advance.
There's no problem with adding other punctuation (?, ! and so on), and I can't see any reason for thinking that grep is preferable here over sed (quite the reverse; as you can see, sed makes much shorter work of it).
I deliberately didn't add . or , because you want to retain the ability to capture floats/decimals as well as comma-separated numbers like "1,920".
Sure, there's other edge cases (maybe we want colon's to capture time statements like 10:19:01), but until the OP is clearer about what exactly he wants there's no point in covering ever possible case. Moreover, any detritus left over in the result can easily be filtered out with AppleScript using text item delimiters or offsets.
Mark and Phil,
OS X 10.9.4. Bash. No pcre. More portable than previous. It handles optional U.S. currency notation with or without trailing spaces before optional punctuated integer or decimal values. The “.," provides support for North American or European currency punctuation. It will handle decimal fractions (e.g. $.45). It will handle sentences such as Phil supplied earlier and extract only the numeric values.
egrep -o "([^[:alpha:]., ]+[$ ]*[[:digit:],.]+|[.,]*[[:digit:]]+)"
The RE works fine for a test file where the values are one to a line, or a single sentence with two numbers, the latter ending a sentence with a period. I made a second test file where I joined the previous data to reflect multiple comma or space separated values on individual lines. Here, it gets some things right, and mangles others. I have a headache now, so this is shelved for the time being.
I changed my test criteria to the following data file contents. This egrep RE parses every datum in the test file correctly with one item per output line.
egrep -o "([^[:space:][:alpha:].,]+?[$ ]*[[:digit:],.]+[^[:space:]]+?[^[:punct:][:alpha:][:space:]]+[\.$]?|[.,]*[[:digit:]]+[.]?)" < nbr2.txtThe data file:
The value of 999,999,999,999.45 is not 125000000, or 1,000,000, or 100000, or 100,000.27.
99999 50,137.15 10,000 1,920 100.37 45.45 10.99 1.99
10 1 .1 .0045
"any number you like, (e.g., 1,920) but don't put a $ sign in front of it like $1,920."
$15,000.45 $100000.45 $ 100.25 $ 2,000.56 $ 3000.45 $.045
$ ,045 .045.
Hi VikingOSX,
It's not working correctly with the standard grep in OSX 10.6.
Example:
.
.
100,000.27.
99999 50,137.15
.
.
I'll play with it when I can find some time. This should be moved to its own thread.
@ Phil Stokes
Phil Stokes wrote:
...but until the OP is clearer about what exactly he wants there's no point in covering ever possible case.
I though that this had gone off topic because it didn't fit Michael's requirement. Anyway, from the shell's perspective this would meet his requirement as stated.
sed 's/[^0-9]//g' <<<"1 920 pixels"Well, you still have to turn it into an integer.
Mark,
Probably differences between the GNU grep on Snow Leopard, and the BSD grep on Mavericks. I deliberately captured the trailing period in 100,000.27. and likely should not. The 99999 50,137.15 sequence gave me fits before I finally dealt with that single space.
/VikingOSX
V.........,
Well, you made short work of it. A minor tweak for gnu grep->
egrep -o "([^[:space:][:alpha:].,]+?[*$*]*[[:digit:],.]+[^[:space:]]+?[^[:punct:][:alpha:][:space:]]+[\.$]?|[.,]*[[:digit:]]+[.]?)"Good work ( let the trailing period be ).
set s to quoted form of "He's got 3 cats and 1,920 dogs and $20,000"
do shell script "sed s/[a-zA-Z\\']//g <<< " & s
set dx to the result
set numlist to {}
repeat with i from 1 to count of words in dx
set this_item to word i of dx
try
set this_item to this_item as number
set the end of numlist to this_item
end try
end repeat
numlist
Brilliant ! That's exactly what I need!
@all
Many thanks to all of you!
Program Language?
egrep -o "(\d{1,3},?\d{1,3},?\d{3}|\d+)" <<< "See spot 1920 run 10,000 circles around Jane 999 who has fainted 1 time."
Output
1920
10,000
999
1
Forgot to mention. Sorroy for that.😮
I want to get Integer from string in Apple Script.
It works well when the a string has numbers only, but in other cases, it may crash.
Long enough as it is.
Well, I'll mention that "\d" isn't portable. Of course replacing the "\d" with [[:digit:]] is portable...:-)
Excellent! What if we remove the period if the last "word" of the string is numeric.
egrep -o "([^[:alpha:]., ]+[$ ]*[[:digit:],.]+|[.,]*[[:digit:]]+[^\.$])"Mark,
Updated to remove trailing period behind last numeric word in sentence.
egrep -o "([^[:alpha:]., ]+[$ ]*[[:digit:],.]+[^[:punct:]\.$]+|[.,]*[[:digit:]]+)"
How to get integer value from a string