Manipulating strings
Validating user input
When the input stream comes from a user, we cannot guarantee that they will enter the data exactly as prompted. In practice, your program must validate or manipulate user input to meet certain requirements. Otherwise, your program may run into unexpected behavior.
Some common manipulations you may need to make to input data include:
- Changing the case of text
- Splitting data by a delimiter
- Ensuring that the input contains some data
- Truncating long data
To remain short, many example code blocks in future units will not show the process of validating standard input. Now you learn about string methods that can be used to validate and manipulate all kinds of strings.
Strings are immutable
Strings are immutable, which means a string value cannot be changed. The variable can be reassigned, but the value of the string itself cannot be modified. Methods called on strings return new string values.
Changing the case of a string
You can change the case of an entire string or individual characters.
Code
spongebob_case = 'i LikE cAkE'
# get a lowercase version of the string
a = spongebob_case.lower()
# get an uppercase version of the string
b = spongebob_case.upper()
# print the current values of all three variables
print(spongebob_case)
print(a)
print(b)
Output
i LikE cAkE
i like cake
I LIKE CAKE
Notice that even after the str.lower()
and str.upper()
methods are called, the value of spongebob_case
remains the same.
Many programs that work with strings forget to take case into account when checking their values.
To compare strings in a case-insensitive manner, use the str.casefold()
method. In addition to producing a string in lowercase, it converts foreign characters like ß
to a common set.
word_a = 'IMPASSE'
word_b = 'impaßee'
comparision0 = word_a.lower() == word_b.lower()
# comparision0 = False
comparision1 = word_a.casefold() == word_b.casefold()
# comparision1 = True
Selecting substrings with slices
Slicing is a powerful way of extracting substrings, smaller strings, from a string value.
The syntax for a slice looks like this:
str[start:stop:step]
Where str
is the string, start
is an integer for the starting position, end
is an integer for the ending position (both inclusive), and step
is the step size for the slice. The position numbers are optional.
message = 'Come meet me at the auditorium.'
message[20:] # 'auditorium.'
message[20:30] # 'auditorium'
message[20:30:2] # 'adtru'
message[20::2] # 'adtru.'
message[:10] # 'Come meet '
Even though the square bracket and colon str[::]
syntax of slices looks different from the dot operator, method, and parameter str.format(a, b)
syntax you will see on other string methods, slices do not modify the string they act on. Slices also return a new string.
Splitting strings on a character
Strings can contain a lot of useful data, often separated by specific delimiter characters. The str.split()
method breaks up a string into a list
of strings, using a given parameter as the separator.
Code
todays_date = '6/1/2018'
parts = todays_date.split('/')
# parts = ['6', '1', '2016']
print('Month: {}, Day: {}, Year: {}'.format(parts[0], parts[1], parts[2]))
Notice that the delimiter character is not preserved in the resulting list of strings.
Other common delimiters you may have seen in your life include: comma-separated value files (.csv), tab delimeters, dash delimeters, newline delimiters, and many more.
Output
Month: 6, Day: 1, Year: 2018
Checking if a string contains a substring
To check if a substring is part of a string, you can use the in
operator. This operation results in a boolean value: True
if the substring is found, False
otherwise.
title = 'The Sorcerer\'s Bone'
found_it = 'Sorcerer' in title
# found_it = True
You can also search for the position of a substring with the str.find()
method. The first argument is the substring to search for. The result will an integer representing the left-most position of the substring: the first occurrence from the left.
If the substring is not found, the result will be -1
.
title = 'The Sorcerer\'s Bone'
idx1 = title.index('er')
# idx1 = 8
idx2 = title.index('v')
# idx2 = -1
You can also supply additional arguments for the search:
str.find(substr, beg=0, end=len(str))
beg
is an integer for the beginning position of the search (inclusive), default =0
end
is an integer for the ending position of the search (exclusive), default =len(str)
title = 'The Sorcerer\'s Bone'
idx1 = title.index('er')
# idx1 = 8
# if not named, the paramaters are evaluate in this order: substr, beg, end
# if an `end` argument is not given, the search goes to the end of the string
idx2 = title.index('er', idx1 + 1)
# idx2 = 10
Replacing substrings
To replace all occurrences of one substring with another substring, use the str.replace()
method.
my_desserts = 'I like chocolate ice cream, strawberry ice cream, and banana ice cream.'
your_desserts = my_desserts.replace('ice cream', 'cake')
# your_desserts = 'I like chocolate cake, strawberry cake, and banana cake.'
Reversing a string
So far all the methods we have looked at for manipulating strings are built-in. You would think there is something like this for reversing strings, right?
Code
name = 'Shelly'
print(name.reverse())
Output
File "filename.py", line 1,
AttributeError: 'str' object has no attribute 'reverse'
Sike. There is no built-in string.reverse()
method. What is a programmer to do?
You could write your own method. Alternative, you could use one of these approaches:
# slice the characters of the string, but incrementing by -1 to go backwards
rev1 = name[::-1]
# rev1 = 'yllehS'
# turn the string into a reversed list and then rejoin them
rev2 = ''.join(reversed(name))
# rev2 = 'yllehS'
Want to know more about why there isn’t a built-in string.reverse()
method? Check out this StackOverflow post.