|
| https://realpython.com/ |
| Start Here | https://realpython.com/start-here/ |
|
Learn Python
| https://realpython.com/videos/python-unicode-overview/ |
| Python Tutorials →In-depth articles and video courses | https://realpython.com/search?kind=article&kind=course&order=newest |
| Learning Paths →Guided study plans for accelerated learning | https://realpython.com/learning-paths/ |
| Quizzes & Exercises →Check your learning progress | https://realpython.com/quizzes/ |
| Browse Topics →Focus on a specific area or skill level | https://realpython.com/tutorials/all/ |
| Community Chat →Learn with other Pythonistas | https://realpython.com/community/ |
| Office Hours →Live Q&A calls with Python experts | https://realpython.com/office-hours/ |
| Podcast →Hear what’s new in the world of Python | https://realpython.com/podcasts/rpp/ |
| Books →Round out your knowledge and learn offline | https://realpython.com/products/books/ |
| Reference →Concise definitions for common Python terms | https://realpython.com/ref/ |
| Code Mentor →BetaPersonalized code assistance & learning tools | https://realpython.com/mentor/ |
| Unlock All Content → | https://realpython.com/account/join/ |
|
More
| https://realpython.com/videos/python-unicode-overview/ |
| Learner Stories | https://realpython.com/learner-stories/ |
| Python Newsletter | https://realpython.com/newsletter/ |
| Python Job Board | https://www.pythonjobshq.com |
| Meet the Team | https://realpython.com/team/ |
| Become a Tutorial Writer | https://realpython.com/write-for-us/ |
| Become a Video Instructor | https://realpython.com/become-an-instructor/ |
| Search | https://realpython.com/search |
| https://realpython.com/search |
| Join | https://realpython.com/account/join/ |
| Sign‑In | https://realpython.com/account/login/?next=%2Fvideos%2Fpython-unicode-overview%2F |
| https://realpython.com/courses/python-unicode/#team |
| Unicode in Python: Working With Character Encodings | https://realpython.com/courses/python-unicode/ |
| Christopher Trudeau | https://realpython.com/courses/python-unicode/#team |
| Recommended Tutorial | https://realpython.com/python-encodings-guide/ |
| Course Slides (.pdf) | https://realpython.com/courses/python-unicode/downloads/unicode-slides/ |
| Sample Code (.zip) | https://realpython.com/courses/python-unicode/downloads/unicode-sample-code/ |
| Ask a Question | https://realpython.com/videos/python-unicode-overview/#discussion |
| https://realpython.com/feedback/survey/course/python-unicode/liked/?from=lesson-title |
| https://realpython.com/feedback/survey/course/python-unicode/disliked/?from=lesson-title |
| Contents | https://realpython.com/videos/python-unicode-overview/#description |
| Transcript | https://realpython.com/videos/python-unicode-overview/#transcript |
| Discussion | https://realpython.com/videos/python-unicode-overview/#discussion |
| Sample Code (.zip) | https://realpython.com/courses/python-unicode/downloads/unicode-sample-code/ |
| Course Slides (.pdf) | https://realpython.com/courses/python-unicode/downloads/unicode-slides/ |
| 00:00 | https://realpython.com/videos/python-unicode-overview/#t=0.48 |
| Welcome to Unicode and Character Encodings in Python. | https://realpython.com/videos/python-unicode-overview/#t=0.48 |
| My name is Chris and I will be your guide. | https://realpython.com/videos/python-unicode-overview/#t=3.36 |
| This course talks about what an encoding is and how it works, | https://realpython.com/videos/python-unicode-overview/#t=6.09 |
| where ASCII came from and how it evolved, | https://realpython.com/videos/python-unicode-overview/#t=9.78 |
| how binary bits can be described in oct and hex and how to use those to map | https://realpython.com/videos/python-unicode-overview/#t=13.14 |
| to code points, the Unicode standard and the UTF-8 encoding thereof, | https://realpython.com/videos/python-unicode-overview/#t=16.89 |
| how UTF-8 uses the underlying bits to encode a code point, | https://realpython.com/videos/python-unicode-overview/#t=22.14 |
| how multiple code points can result in a single character or glyph, functions | https://realpython.com/videos/python-unicode-overview/#t=26.28 |
| built into Python that can help you when you’re messing around with characters | https://realpython.com/videos/python-unicode-overview/#t=30.6 |
| in Unicode, and other encodings. First off, strings and character encoding is | https://realpython.com/videos/python-unicode-overview/#t=34.08 |
| one of the big changes between Python 2 in Python 3. In fact, | https://realpython.com/videos/python-unicode-overview/#t=39.24 |
| it’s one of the better reasons to move from Python 2 to Python 3. | https://realpython.com/videos/python-unicode-overview/#t=42.54 |
| 00:46 | https://realpython.com/videos/python-unicode-overview/#t=46.23 |
| All the examples in this course will be Python 3 based. If you’re using a | https://realpython.com/videos/python-unicode-overview/#t=46.23 |
| Python 2 interpreter, you’re not going to be able to follow along. | https://realpython.com/videos/python-unicode-overview/#t=50.04 |
| It’s really easy to forget when you’re programming in a nice high-level language | https://realpython.com/videos/python-unicode-overview/#t=53.61 |
| like Python that computers really only understand numbers. | https://realpython.com/videos/python-unicode-overview/#t=56.82 |
| 01:00 | https://realpython.com/videos/python-unicode-overview/#t=60.57 |
| When you’re dealing with text, | https://realpython.com/videos/python-unicode-overview/#t=60.57 |
| you’re actually dealing with a mapping between a number and a character that is | https://realpython.com/videos/python-unicode-overview/#t=61.89 |
| being displayed. | https://realpython.com/videos/python-unicode-overview/#t=66.06 |
| The fundamental item that is being stored in memory is still a number. ASCII was | https://realpython.com/videos/python-unicode-overview/#t=67.59 |
| one of the preeminent standards for this kind of mapping. | https://realpython.com/videos/python-unicode-overview/#t=72.9 |
| 01:16 | https://realpython.com/videos/python-unicode-overview/#t=76.17 |
| It specified that certain numbers represented certain letters, | https://realpython.com/videos/python-unicode-overview/#t=76.17 |
| and so when the computer used those numbers in the context of a string | https://realpython.com/videos/python-unicode-overview/#t=79.8 |
| it would produce the right letters. | https://realpython.com/videos/python-unicode-overview/#t=83.91 |
| 01:26 | https://realpython.com/videos/python-unicode-overview/#t=86.19 |
| The problem with ASCII was it really only encoded the Latin alphabet. | https://realpython.com/videos/python-unicode-overview/#t=86.19 |
| It didn’t even include accented characters. | https://realpython.com/videos/python-unicode-overview/#t=89.13 |
| It was invented by and for English speakers; | https://realpython.com/videos/python-unicode-overview/#t=91.5 |
| it wasn’t until later that accents for other Western languages were added. By | https://realpython.com/videos/python-unicode-overview/#t=93.99 |
| contrast, | https://realpython.com/videos/python-unicode-overview/#t=98.64 |
| Unicode is an international standard and has enough space to encode all written | https://realpython.com/videos/python-unicode-overview/#t=99.3 |
| languages. In fact, it has space to encode other things as well, | https://realpython.com/videos/python-unicode-overview/#t=103.62 |
| like emojis. At one point in time, there was even a move to add Klingon to it, | https://realpython.com/videos/python-unicode-overview/#t=107.88 |
| but it was turned down. But there’s still space left over | https://realpython.com/videos/python-unicode-overview/#t=112.11 |
| if the standard body changes its mind. First off, a little history. | https://realpython.com/videos/python-unicode-overview/#t=115.11 |
| 02:00 | https://realpython.com/videos/python-unicode-overview/#t=120.03 |
| I think I mentioned that computers only understand numbers? Well, | https://realpython.com/videos/python-unicode-overview/#t=120.03 |
| computers only understand numbers. In fact, it’s even worse than that— | https://realpython.com/videos/python-unicode-overview/#t=123.03 |
| they really only understand binary. Everything is a 1 or a 0. | https://realpython.com/videos/python-unicode-overview/#t=126.6 |
| 02:10 | https://realpython.com/videos/python-unicode-overview/#t=130.8 |
| This goes down to how transistors work—they’re either on or off. | https://realpython.com/videos/python-unicode-overview/#t=130.8 |
| So, inside of the computer, everything is represented as either True or | https://realpython.com/videos/python-unicode-overview/#t=134.28 |
| False, on or off, or 1 or 0 to represent that. Everything on top of that is an | https://realpython.com/videos/python-unicode-overview/#t=137.49 |
| abstraction. | https://realpython.com/videos/python-unicode-overview/#t=144.3 |
| 02:25 | https://realpython.com/videos/python-unicode-overview/#t=145.86 |
| A byte is a grouping of bits. In the early history of computers, | https://realpython.com/videos/python-unicode-overview/#t=145.86 |
| the size of a byte was different from different machines. | https://realpython.com/videos/python-unicode-overview/#t=149.79 |
| By the time PCs came around, | https://realpython.com/videos/python-unicode-overview/#t=152.67 |
| there were 8 bits to a byte, and that’s pretty common now. | https://realpython.com/videos/python-unicode-overview/#t=154.41 |
| 02:37 | https://realpython.com/videos/python-unicode-overview/#t=157.77 |
| Now, most processors deal with more than one byte at a time, | https://realpython.com/videos/python-unicode-overview/#t=157.77 |
| but instead of redefining how big a byte is, | https://realpython.com/videos/python-unicode-overview/#t=160.71 |
| they have other terms like word for groupings of bytes. | https://realpython.com/videos/python-unicode-overview/#t=162.96 |
| 02:46 | https://realpython.com/videos/python-unicode-overview/#t=166.47 |
| An 8-bit byte can hold 2^8 combinations— | https://realpython.com/videos/python-unicode-overview/#t=166.47 |
| that’s 256. The counting starts at 0, | https://realpython.com/videos/python-unicode-overview/#t=169.92 |
| so the number range, instead of being from 1 to 256, is from 0 to 255. Back in | https://realpython.com/videos/python-unicode-overview/#t=173.76 |
| 03:00 | https://realpython.com/videos/python-unicode-overview/#t=180.22 |
| the olden times—and I’m talking about time | https://realpython.com/videos/python-unicode-overview/#t=180.22 |
| so old that even an old man like me thinks they’re the past—IBM introduced BCD, | https://realpython.com/videos/python-unicode-overview/#t=182.12 |
| or Binary Coded Decimal. This was an early encoding. It was very, | https://realpython.com/videos/python-unicode-overview/#t=186.79 |
| very simple and very small. It used 6 bits to represent a character. | https://realpython.com/videos/python-unicode-overview/#t=190.6 |
| 03:15 | https://realpython.com/videos/python-unicode-overview/#t=195.22 |
| This wasn’t enough to even fully cover the English language, | https://realpython.com/videos/python-unicode-overview/#t=195.22 |
| so IBM extended BCD with EBCDIC—Extended Binary | https://realpython.com/videos/python-unicode-overview/#t=197.8 |
| Coded Decimal Interchange Code. | https://realpython.com/videos/python-unicode-overview/#t=202.54 |
| 03:25 | https://realpython.com/videos/python-unicode-overview/#t=205.33 |
| This used a full 8 bits to describe a character and was so advanced | https://realpython.com/videos/python-unicode-overview/#t=205.33 |
| it actually included lowercase letters. Around the same time as EBCDIC being | https://realpython.com/videos/python-unicode-overview/#t=209.53 |
| standardized, ASCII was introduced. | https://realpython.com/videos/python-unicode-overview/#t=214.03 |
| 03:36 | https://realpython.com/videos/python-unicode-overview/#t=216.55 |
| ASCII was put together by a standards body rather than by a single company and | https://realpython.com/videos/python-unicode-overview/#t=216.55 |
| became more popular across different platforms. | https://realpython.com/videos/python-unicode-overview/#t=220.09 |
| ASCII only required 7 bits, | https://realpython.com/videos/python-unicode-overview/#t=223.39 |
| but at the time most computers were using an 8-bit byte, | https://realpython.com/videos/python-unicode-overview/#t=225.46 |
| so the lead bit was just left as 0. Sometimes, using some transmission protocols | https://realpython.com/videos/python-unicode-overview/#t=228.55 |
| like over modems or terminals, that 8th bit would be used as a parity bit to | https://realpython.com/videos/python-unicode-overview/#t=233.8 |
| make sure that the byte had been transmitted correctly. | https://realpython.com/videos/python-unicode-overview/#t=238.21 |
| 04:01 | https://realpython.com/videos/python-unicode-overview/#t=241.9 |
| ASCII was adopted as an international standard in 1967, and quickly | https://realpython.com/videos/python-unicode-overview/#t=241.9 |
| there were several iterations and extensions made on top of it. | https://realpython.com/videos/python-unicode-overview/#t=245.92 |
| The extended ASCII format moved to a full 8-bits of description and added | https://realpython.com/videos/python-unicode-overview/#t=249.49 |
| accent characters, | https://realpython.com/videos/python-unicode-overview/#t=253.93 |
| allowing Western languages that were not English to be described. | https://realpython.com/videos/python-unicode-overview/#t=255.25 |
| 04:19 | https://realpython.com/videos/python-unicode-overview/#t=259.18 |
| PCs used ASCII, so when they became the defacto standard, | https://realpython.com/videos/python-unicode-overview/#t=259.18 |
| ASCII became the way of communicating between computers. For clarity’s sake, | https://realpython.com/videos/python-unicode-overview/#t=262.33 |
| let’s establish some common terminology. First off, what’s a character? | https://realpython.com/videos/python-unicode-overview/#t=266.59 |
| 04:31 | https://realpython.com/videos/python-unicode-overview/#t=271.24 |
| This probably feels clear to you—it’s that one little single unit of text— | https://realpython.com/videos/python-unicode-overview/#t=271.24 |
| but this term can actually get a little confusing depending on who you’re | https://realpython.com/videos/python-unicode-overview/#t=275.38 |
| talking to. So for the purposes of this course, | https://realpython.com/videos/python-unicode-overview/#t=278.62 |
| the word character is going to mean a minimal unit of text that has a semantic | https://realpython.com/videos/python-unicode-overview/#t=281.47 |
| value. So, that includes things like emojis, or symbols in Han Chinese, | https://realpython.com/videos/python-unicode-overview/#t=285.64 |
| as well as obvious stuff like the letter A. | https://realpython.com/videos/python-unicode-overview/#t=290.08 |
| 04:52 | https://realpython.com/videos/python-unicode-overview/#t=292.66 |
| A character set is just a collection of these characters, | https://realpython.com/videos/python-unicode-overview/#t=292.66 |
| and these sets can be used across multiple languages. | https://realpython.com/videos/python-unicode-overview/#t=295.66 |
| Think about the Latin character set that most European languages can use, | https://realpython.com/videos/python-unicode-overview/#t=298.84 |
| the Greek character set that pretty much only the Greek language can use, | https://realpython.com/videos/python-unicode-overview/#t=302.95 |
| and the Russian character set, which is used across certain Slavic languages. | https://realpython.com/videos/python-unicode-overview/#t=306.34 |
| 05:10 | https://realpython.com/videos/python-unicode-overview/#t=310.93 |
| A code point is a number that represents a single character in one of these sets | https://realpython.com/videos/python-unicode-overview/#t=310.93 |
| of encoded characters. For example, in the ASCII standard, | https://realpython.com/videos/python-unicode-overview/#t=315.94 |
| the capital letter 'A' is the decimal number 65. | https://realpython.com/videos/python-unicode-overview/#t=320.86 |
| 05:25 | https://realpython.com/videos/python-unicode-overview/#t=325.69 |
| A code unit, by contrast to a code point, is a sequence of bits that represent | https://realpython.com/videos/python-unicode-overview/#t=325.69 |
| that code point. In ASCII, | https://realpython.com/videos/python-unicode-overview/#t=331.03 |
| the code point 65 means 'A', and it’s stored in the computer using that number. | https://realpython.com/videos/python-unicode-overview/#t=333.16 |
| 05:37 | https://realpython.com/videos/python-unicode-overview/#t=337.72 |
| In other encoding standards that mapping may not apply. | https://realpython.com/videos/python-unicode-overview/#t=337.72 |
| As I mentioned before, in the original ASCII standard, | https://realpython.com/videos/python-unicode-overview/#t=340.72 |
| a code unit was 7 bits long, | https://realpython.com/videos/python-unicode-overview/#t=343.87 |
| so that covered from the numbers 0 to 127. Unicode supports different kinds of | https://realpython.com/videos/python-unicode-overview/#t=346.0 |
| encodings, and some of those even have varying length code units. | https://realpython.com/videos/python-unicode-overview/#t=351.52 |
| 05:55 | https://realpython.com/videos/python-unicode-overview/#t=355.87 |
| UTF-8, one of those encodings, is an 8-bit encoding, | https://realpython.com/videos/python-unicode-overview/#t=355.87 |
| but its code point can map to 1, 2, 3, or 4 code units, | https://realpython.com/videos/python-unicode-overview/#t=359.81 |
| so multiple bytes may be describing a single code point. | https://realpython.com/videos/python-unicode-overview/#t=364.61 |
| 06:08 | https://realpython.com/videos/python-unicode-overview/#t=368.96 |
| That’s enough background. | https://realpython.com/videos/python-unicode-overview/#t=368.96 |
| Let’s look at some code. In order to inspect some strings, | https://realpython.com/videos/python-unicode-overview/#t=369.98 |
| I’ve written a quick little method inside of a file called show.py. The core | https://realpython.com/videos/python-unicode-overview/#t=373.19 |
| part of this method is line 5, | https://realpython.com/videos/python-unicode-overview/#t=377.78 |
| which uses the built-in ord() function, returning the code point of the character | https://realpython.com/videos/python-unicode-overview/#t=380.42 |
| that is passed in. | https://realpython.com/videos/python-unicode-overview/#t=385.01 |
| 06:29 | https://realpython.com/videos/python-unicode-overview/#t=389.63 |
| I’m going to import that function into the REPL and start with a simple string | https://realpython.com/videos/python-unicode-overview/#t=389.63 |
| in English saying 'Hello there'. | https://realpython.com/videos/python-unicode-overview/#t=394.91 |
| Calling code_points() on that prints out the code point for each one of | https://realpython.com/videos/python-unicode-overview/#t=396.79 |
| the values in the str (string). | https://realpython.com/videos/python-unicode-overview/#t=400.91 |
| 06:42 | https://realpython.com/videos/python-unicode-overview/#t=402.86 |
| If you look at this, capital 'H' is 72 in ASCII, | https://realpython.com/videos/python-unicode-overview/#t=402.86 |
| so it maps down below. Six characters in, you’ll see 32— | https://realpython.com/videos/python-unicode-overview/#t=406.88 |
| that’s a space (" ") in ASCII. Notice that every one of these numbers is below 128— | https://realpython.com/videos/python-unicode-overview/#t=411.77 |
| that means they’re in the range of the original ASCII 7-bit standard. | https://realpython.com/videos/python-unicode-overview/#t=417.74 |
| 07:02 | https://realpython.com/videos/python-unicode-overview/#t=422.15 |
| Let’s look at something a little more challenging. | https://realpython.com/videos/python-unicode-overview/#t=422.15 |
| 07:05 | https://realpython.com/videos/python-unicode-overview/#t=425.18 |
| Here’s some Russian that says “da svidaniya”, or at least, that’s what the web page I | https://realpython.com/videos/python-unicode-overview/#t=425.18 |
| copied it from said it did— | https://realpython.com/videos/python-unicode-overview/#t=430.1 |
| I hope it says that. Running code_points() on it, | https://realpython.com/videos/python-unicode-overview/#t=431.57 |
| 07:16 | https://realpython.com/videos/python-unicode-overview/#t=436.88 |
| you get a significantly larger set of numbers. | https://realpython.com/videos/python-unicode-overview/#t=436.88 |
| Now, the third character in is 32—a space—just like in 'Hello there'. | https://realpython.com/videos/python-unicode-overview/#t=440.15 |
| And if you look near here at the end, there’s a character that’s 225, | https://realpython.com/videos/python-unicode-overview/#t=445.46 |
| which is below 256 in the extended ASCII range. | https://realpython.com/videos/python-unicode-overview/#t=449.57 |
| 07:34 | https://realpython.com/videos/python-unicode-overview/#t=454.37 |
| That is the accented | https://realpython.com/videos/python-unicode-overview/#t=454.37 |
| 'á'. Everything else here is from the Cyrillic alphabet, | https://realpython.com/videos/python-unicode-overview/#t=456.23 |
| which has much higher code point numbers above the ASCII range. All of these, | https://realpython.com/videos/python-unicode-overview/#t=460.04 |
| as you’ll notice, are sort of around a thousand. | https://realpython.com/videos/python-unicode-overview/#t=464.63 |
| 07:48 | https://realpython.com/videos/python-unicode-overview/#t=468.35 |
| That’s it for the introduction. Next up, | https://realpython.com/videos/python-unicode-overview/#t=468.35 |
| I’ll dive deeper into Python strings and their relationship to ASCII. | https://realpython.com/videos/python-unicode-overview/#t=470.45 |
| Become a Member | https://realpython.com/account/join/ |
| Overview | https://realpython.com/courses/python-unicode/ |
| https://realpython.com/videos/ascii-python-string-module/ |
|
Unicode in Python: Working With Character Encodings (Overview) 07:56
| https://realpython.com/videos/python-unicode-overview/ |
|
Working With ASCII and the Python String Module 05:49
| https://realpython.com/videos/ascii-python-string-module/ |
|
Working in Binary: Bits, Bytes, Oct, and Hex 06:26
| https://realpython.com/lessons/bits-bytes-oct-hex/ |
|
Using Unicode 04:15
| https://realpython.com/lessons/using-unicode/ |
|
Encoding UTF-8 06:19
| https://realpython.com/lessons/encoding-utf8/ |
|
Combining Characters 05:40
| https://realpython.com/lessons/combining-characters/ |
|
Using Built-In Functions 05:38
| https://realpython.com/lessons/built-in-functions/ |
|
Using Other Encodings 04:45
| https://realpython.com/lessons/other-encodings/ |
|
Unicode in Python: Working With Character Encodings (Summary) 04:53
| https://realpython.com/lessons/python-unicode-summary/ |
| Privacy Policy | https://realpython.com/privacy-policy/ |
Viewport: width=device-width, initial-scale=1, shrink-to-fit=no, viewport-fit=cover