Using triple quotes & regex in Python to solve the problem of single/double quotes in strings

Python string contains height in metric system. What could go wrong?

Photo by Markus Spiske on Unsplash

Let’s start by first answering the question in the subtitle. Many things could go wrong! Let me quickly take you through a few scenarios using an example.

Let’s say we want to store the following line as a Python string — “Minimum height of 5'8” and minimum weight of 65 kgs”. It does sound like a pretty normal statement in English. A statement you would use to describe someone or describe some requirements you may have. But here’s the catch. The height is not being described in centimeters or meters. We’re using the imperial system, which uses feets and inches. And the way in which we define this numerically is to use a single quote to represent feet and double quote to represent inches. So a height of 5'8" translates to a height of 5 feet and 8 inches. Now Python uses quotes to represent strings. So here’s what happens when we try to use double quotes to represent this sentence as a string.

Can’t use double quotes

We get a SyntaxError! That’s because the double quote after the number 8 tells Python that the string ends there but, in reality, it does not. Let’s try using a single quote and see.

Can’t use single quotes

We get another SyntaxError. This time because of the single quote right after the number 5. So what can we do next?

Use triple quotes! Many online tutorials actually use triple quotes in the context of writing comments. According to these tutorials, one can use # to write single line comments and triple quotes to write multi line comments. But guess what? If you have multi line statements, you can use triple quotes to create a multi line string.

So let us know us triple quotes and see what happens.

Triple quotes to the rescue!

And it worked! We were finally able to create the string description with the statement.

Even though we solved our problem using triple quotes, most people would still prefer to use single or double quotes to create their strings. How do we use remove single/double quotes and still use the statement without losing a lot of meaning? Now, in common parlance, I’ve seen people write 5'8" as simple 5.8 and get it done with. Now this is not entirely accurate (since 12 inches make a feet) but it gets the point across, specially when it is meant to be read by a human. So, let’s try and replace these quotes with a decimal.

We use Python’s regex library re to implement the solution. The regex pattern to be considered here is that there should be two numbers separated by a single quote and the last number is followed by a double quote. In regex, you would write it as “\d+\’\d+\””. We use re.sub to substitute wherever we find such a pattern in the string. Let’s see in the image what happens when we do that.

Removed double quotes but the numbers are gone too!

We were successfully able to remove only the part that has the pattern! But wait. There’s a problem. We removed all the numbers too! We didn’t tell regex what it should put in place of the replaced pattern.

Actually what we need to do is as follows:
1. Remove the quotes.
2. Keep the numbers as they are.
3. Add a decimal point in between the numbers.

The first point is taken care by the regex pattern. To keep the numbers as they are, we do what is known as Regex grouping and back-reference. We group the numbers in the pattern by using brackets and correspondingly use 1 and 2 to indicate that we want to keep these numbers as they are in the new string. The use of 1 and 2 to inform the re.sub function to keep the numbers as is is called back-reference (since we’re referencing already present characters). And lastly, we add a decimal point between these two back-referenced numbers to complete the replacement pattern. Take a look at the image below to see the example.

Regex grouping and back-reference

And that’s it! We’ve now successfully tackled the presence of single and double quotes in our string and made a good enough replacement that can be understood by humans. Is there a better way to do this? I’d like to learn! Please do let me know in the comments below or reach out to me on my LinkedIn —
https://www.linkedin.com/in/imaad-mohamed-khan-218b3999/.
And while you’re at it, do check my Youtube out too. I have some interesting videos — https://www.youtube.com/channel/UC6VPXglDoZYMOj2kr-flNJQ

Writing at the intersection of data and the world.