How do I find the character of a substring in the middle of a string?

i shared the steps to reproduce error and sample code an addition text file


iPhone XR

Posted on Aug 18, 2023 6:44 PM

Reply
23 replies

Aug 18, 2023 9:54 PM in response to maadhyamik

I’m not in a position to check the Tamil encoding right now, but I suspect the following is why youmare having issues…


s.range (Range<Int>) doesn’t account for multi-byte characters, while index does:

https://docs.swift.org/swift-book/documentation/the-swift-programming-language/stringsandcharacters/


Here’s a pretty good explanation of the underlying issue:

https://stackoverflow.com/questions/39676939/how-does-string-index-work-in-swift/39676940#39676940



Aug 19, 2023 5:44 PM in response to maadhyamik

As MrHoffman suggests, this is probably an encoding problem.


Most example code is still going to be Anglo-centric. That kind of logic won't work with anything other than basic ASCII.


What you want to do is not attempt to do low-level code like this. Always assume strings are Unicode. You can accomplish the same function by using something higher level, like regex. Here is an example:


<sigh> Oh, to be in 2023...</sigh>


var s = "Hello, world" // Replace with Tamil
var ss = "worl"

var regex = Regex("(\(ss).*)")
if let r2 = s.firstMatch(of: regex)
  {
  print(r2.0)
  }
else
  {
  print ("ERROR: Substring \(ss) not found in String \(s)!")
  }

ss = "****" // Replace with Tamil

regex = Regex("(\(ss).*)")

if let r1 = s.firstMatch(of: regex)
  {
  print(r1.0)
  }
else
  {
  print ("ERROR: Substring \(ss) not found in String \(s)!")
  }


For the record, I did test this with Tamil strings. I just can't post them, not even with "Additional text".


LOL! And I even got censored!

Aug 18, 2023 7:34 PM in response to maadhyamik

The Tamil {can’t post it here as text, per the forum filtering}

is definitely not "ள௠" or whatever is showing up here. It seems the Apple forum Additional Text function, or whatever tool or mechanism is being used to insert or paste the (Unicode?) text here is not preserving things. And it seems the forum is blocking Tamil text in the English communities. Which is going to make answering this that much more interesting.


Which character of that is failing?


Can you post the Swift code (without the Tamil) here, using the <> code tags, as a starting point?


KiltedTim: this thread has now been relocated.

Aug 18, 2023 7:54 PM in response to MrHoffman

Please refer to the shared image. I found the problem only middle of the substring.

var s = "replace tamil character s"
var ss = "replace tamil character ss"
if let r2 = s.range(of: ss)?.lowerBound {
    print(s[r2...])
} else {
    print("ERROR: Substring",ss,"not found in String",s,"!")
}

ss = "replace new tamil character ss"
if let r1 = s.range(of: ss)?.lowerBound {
    print(s[r1...])
} else {
    print("ERROR: Substring",ss,"not found in String",s,"!")
}

Aug 20, 2023 4:47 PM in response to MrHoffman

MrHoffman wrote:

It would be really helpful to be able to post the actual string around here, but I digress.

Absolutely. I even tried to save the original image and use the built-in OCR to read it. It worked great for all but the Tamil. This site just isn't ready for the world outside California.

If there’s a combining dot (somewhere), I’d have to wonder if it’s Unicode normalization in play.

That's what's interesting about this. The OP tried a basic operation that they expected, for good reason, to have worked. But even at the NSString/Objective-C levels, there are strings and then there are "foreign" strings. It is necessary to pass an option to ignore diacritics altogether in order to perform a search. The localizedCaseInsensitiveContains function also does that, as well as ignore case. But that really isn't the correct answer. It's fine if one wants to search while ignoring diacritics, but what if you don't want to do that. What if you want to detect the presence of that dot? It looks like Regex is the only way. (To be fair, I didn't test other, purely Objective-C methods.)


I've been doing an awful lot in Swift and SwiftUI lately. I think I'm coming to a realization of how Apple works internally. I think they are really, and I mean obsessively, deadline-driven. An Apple developer gets done what they get done by the deadline, and then it's done. Maybe they get reassigned. Maybe it gets rewritten next year. But nothing ever gets refined. Nothing ever gets "polished".

Aug 20, 2023 6:14 AM in response to maadhyamik

maadhyamik wrote:

I got the solution, i used them to get the rage on var s.range(of: ss1, options: .diacriticInsensitive) it seems to work.

That seems to require use of Foundation and NSString. Isn't that kind of like cheating, at least as far as Swift goes? My first inclination was to see how I would do this in Perl. That is what led met to the Regex solution. But I guess Regex is a cheat too, isn't it? It is using PCRE underneath.


It will never fail to amuse me that Apple can't figure out how to handle Unicode in Swift, simply proclaims it to be impossible, and gives up. Poor developers are then left to use solutions that rely on older technologies in different languages (Foundation in Objective-C or PCRE in C) that never go the message that Unicode was impossible.

Aug 20, 2023 11:34 AM in response to maadhyamik

Scanning app source code out of an image doesn't always produce working code.


Here is a standalone Swift command-line module built with Xcode 14.3.1 on macOS 13.4.1. (Unpolished code. The repeated blocks should be func's. Etc. But here we are.)


This code works correctly with both the Unicode shown, and with the Tamil string and substring replaced.



import Foundation

let str = "abc1️⃣2️⃣3️⃣xyz" // Tamil string goes here
var substr: String

substr = "x" // Tamil substring goes here
if let range = str.range(of:substr) {
    let substring = str[..<range.lowerBound]
    print("String ", substr," present")
}
else {
    print("String ", substr, " not present")
}

substr = "1️⃣"  // another test case
if let range = str.range(of:substr) {
    let substring = str[..<range.lowerBound]
    print("String ", substr," present")
}
else {
    print("String ", substr, " not present")
}


My preference for posting reproducers is to require as little as reasonably possible from those that might answer the question, and also because creating a standalone reproducer forces me to distill the error to minimal code and which can find latent bugs.

Aug 20, 2023 2:25 PM in response to MrHoffman

MrHoffman wrote:

This code works correctly with both the Unicode shown, and with the Tamil string and substring replaced.

The code does not work correctly with the Tamil string. It is difficult because it has a combining dot over two characters. The only way I managed to get it input was to manually type the characters without the dot and do a Google search for it. That returned results that did have the dot, which I could copy and paste. Luckily, Tamil is not Chinese and there is a limited number of characters to choose from.

Aug 22, 2023 2:47 AM in response to MrHoffman

MrHoffman wrote:

Can somebody base64-encode the Tamil text strings and post it, so I have the right text for further testing?
base64 —-input=infile —-output=outfile
base64 —-decode —-input-infile —-output=outfile

If you are talking about the original text, I can tell you how to type it using the Tamil Anjal input source:


The top one is avarkaL, the bottom one is kaL.




Aug 22, 2023 5:33 AM in response to Tom Gewecke

Tom Gewecke wrote:

That's the Tamil virama character, made with the F key on the Tamil99 input source after whatever character it goes over. (It just indicates that the consonant is pronounced on its own, instead of with an inherent a vowel.)

I refuse to even use the iOS-style press-and-hold input method for French! I ain't trying no funky keyboards!


That was the character I was looking at in the keyboard viewer, but I didn't know how to type it. I suppose I could have manually constructed the UTF-8, but I didn't want to work that hard. I think my Google search method worked pretty good.

Feb 9, 2024 4:18 AM in response to etresoft

I remember, as a member of a development team, knowing that my next performance review would be based on how well I met my forecasted deliverables within a 15-minute window of when I said they would be done. No pressure there…


I believe that macOS has become so unnecessarily complicated that polish is too costly in terms of schedule or available resource planning.

This thread has been closed by the system or the community team. You may vote for any posts you find helpful, or search the Community for additional answers.

How do I find the character of a substring in the middle of a string?

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple Account.