Common Built-in data types
Last updated
Last updated
A string is a sequence of characters. You can access the characters one at a time with the bracket operator:
The second statement selects character number 1 from fruit
and assigns it to letter
. The expression in brackets is called an index. The index indicates which character in the sequence you want (hence the name). But be careful with indices, you might not get what you expect:
For most people, the first letter (meaning at position 1) of 'banana'
is b
, not a
. But for computer scientists, the index is an offset from the beginning of the string, and the offset of the first letter is zero.
So b
is the letter ("zero-eth") of 'banana'
, a
is the letter ("one-eth"), and n
is the ("two-eth"") letter.
You can use any expression, including variables and operators, as an index, but the value of the index has to be an integer. Otherwise you get a TypeError
:
A segment of a string is called a slice. Selecting a slice is similar to selecting a character:
The operator [n:m]
returns the part of the string from the n-eth
character to the m-eth
character, including the first but excluding the last. This behaviour is counter intuitive, but it might help to imagine the indices pointing between the characters, as in the following diagram:
If you omit the first index (before the colon), the slice starts at the beginning of the string. If you omit the second index, the slice goes to the end of the string:
If the first index is greater than or equal to the second the result is an empty string, represented by two quotation marks:
An empty string contains no characters and has length 0, but other than that, it is the same as any other string.
It is tempting to use the [ ]
operator on the left side of an assignment, with the intention of changing a character in a string. For example:
The "object" in this case is the string and the "item'' is the character you tried to assign. For now, an object is the same thing as a value, but we will refine that definition later. An item is one of the values in a sequence. The reason for the error is that strings are immutable, which means you can't change an existing string. The best you can do is create a new string that is a variation on the original:
This example concatenates a new first letter onto a slice of greeting
. It has no effect on the original string.
Like a string, a list is a sequence of values. In a string, the values are all characters; in a list, they can be any type. The values in a list are called elements or sometimes items.
There are several ways to create a new list; the simplest is to enclose the elements in square brackets ([
and ]
):
The first example is a list of four integers. The second is a list of three strings. The elements of a list don't have to be the same type. The following list contains a string, a float, an integer, and another list:
A list within another list is said to be nested. A list that contains no elements is called an empty list; you can create one with empty brackets, []
. As you might expect, you can assign list values to variables:
The syntax for accessing the elements of a list is the same as for accessing the characters of a string - the bracket operator. The expression inside the brackets specifies the index. Remember that the indices start at 0:
Unlike strings, lists are mutable. When the bracket operator appears on the left side of an assignment, it identifies the element of the list that will be assigned.
The one-eth element of numbers
, which used to be 123
, is now 5
.
You can think of a list as a relationship between indices and elements. This relationship is called a mapping; each index "maps to" one of the elements. Here is a state diagram showing cheeses
, numbers
and empty
:
Lists are represented by boxes with the word "list" outside and the elements of the list inside. cheeses
refers to a list with three elements indexed 0
, 1
and 2
. numbers
contains two elements; the diagram shows that the value of the second element has been reassigned from 123
to 5
. empty
refers to a list with no elements. List indices work the same way as string indices:
Any integer expression can be used as an index.
If you try to read or write an element that does not exist, you get an IndexError
.
If an index has a negative value, it counts backward from the end of the list.
Similarly to the string data type, a list can be sliced:
The slice operator also works on lists:
If you omit the first index, the slice starts at the beginning. If you omit the second, the slice goes to the end. So if you omit both, the slice is a copy of the whole list.
Since lists are mutable, it is often useful to make a copy before performing operations that fold, spindle or mutilate lists.
A slice operator on the left side of an assignment can update multiple elements:
You can add elements to a list by squeezing them into an empty slice:
And you can remove elements from a list by assigning the empty list to them:
But both of those operations can be expressed more clearly with list methods shown in the section "more on List".
A tuple is a sequence of values. The values can be any type, and they are indexed by integers., so in that respect tuples are a lot like lists. The important difference is that tuples are immutable (like Strings). Syntactically, a tuple is a comma-separated list of values:
Although it is not necessary, it is common to enclose tuples in parentheses:
To create a tuple with a single element, you have to include a final comma:
Be careful, a value in parentheses is not a tuple:
Another way to create a tuple is the built-in function tuple
. With no argument, it creates an empty tuple:
If the argument is a sequence (string, list or tuple), the result is a tuple with the elements of the sequence:
Because tuple
is the name of a built-in function, you should avoid using it as a variable name. Most list operators also work on tuples. The bracket operator indexes an element:
Similarly to the strings and lists, the slice operator selects a range of elements.
But if you try to modify one of the elements of the tuple, you get an error:
You can't modify the elements of a tuple, but you can replace one tuple with another:
It is often useful to swap the values of two variables. With conventional assignments, you have to use a temporary variable. For example, to swap a
and b
:
This solution is cumbersome; tuple assignment is more elegant:
The left side is a tuple of variables; the right side is a tuple of expressions. Each value is assigned to its respective variable. All the expressions on the right side are evaluated before any of the assignments. The number of variables on the left and the number of values on the right have to be the same:
More generally, the right side can be any kind of sequence (string, list or tuple). For example, to split an email address into a user name and a domain, you could write:
The return value from split
is a list with two elements; the first element is assigned to uname
, the second to domain
.
It is recommended to read the documentation about the split
"function" from the string (str
) data type as it is a very useful function.
len
len
is a built-in function that returns the number of characters in a string or the number of elements in a list or a tuple:
To get the last letter of a string, a list or a tuple you might be tempted to try something like this:
The reason for the IndexError
is that there is no letter in 'banana'
with the index 6. Since we started counting at zero, the six letters are numbered 0 to 5. To get the last character, you have to subtract 1 from length
:
Alternatively, you can use negative indices, which count backward from the end of the string. The expression fruit[-1]
yields the last letter, fruit[-2]
yields the second to last, and so on.