|
| https://realpython.com/ |
| Start Here | https://realpython.com/start-here/ |
|
Learn Python
| https://realpython.com/lessons/python-unicode-summary/ |
| Python Tutorials →In-depth articles and video courses | https://realpython.com/search?kind=article&kind=course&order=newest |
| Learning Paths →Guided study plans for accelerated learning | https://realpython.com/learning-paths/ |
| Quizzes & Exercises →Check your learning progress | https://realpython.com/quizzes/ |
| Browse Topics →Focus on a specific area or skill level | https://realpython.com/tutorials/all/ |
| Community Chat →Learn with other Pythonistas | https://realpython.com/community/ |
| Office Hours →Live Q&A calls with Python experts | https://realpython.com/office-hours/ |
| Podcast →Hear what’s new in the world of Python | https://realpython.com/podcasts/rpp/ |
| Books →Round out your knowledge and learn offline | https://realpython.com/products/books/ |
| Reference →Concise definitions for common Python terms | https://realpython.com/ref/ |
| Code Mentor →BetaPersonalized code assistance & learning tools | https://realpython.com/mentor/ |
| Unlock All Content → | https://realpython.com/account/join/ |
|
More
| https://realpython.com/lessons/python-unicode-summary/ |
| Learner Stories | https://realpython.com/learner-stories/ |
| Python Newsletter | https://realpython.com/newsletter/ |
| Python Job Board | https://www.pythonjobshq.com |
| Meet the Team | https://realpython.com/team/ |
| Become a Tutorial Writer | https://realpython.com/write-for-us/ |
| Become a Video Instructor | https://realpython.com/become-an-instructor/ |
| Search | https://realpython.com/search |
| https://realpython.com/search |
| Join | https://realpython.com/account/join/ |
| Sign‑In | https://realpython.com/account/login/?next=%2Flessons%2Fpython-unicode-summary%2F |
| Unlock This Lesson | https://realpython.com/account/join/?utm_source=rp_lesson&utm_content=python-unicode |
| Unlock This Lesson | https://realpython.com/account/join/?utm_source=rp_lesson&utm_content=python-unicode |
| https://realpython.com/courses/python-unicode/#team |
| Unicode in Python: Working With Character Encodings | https://realpython.com/courses/python-unicode/ |
| Christopher Trudeau | https://realpython.com/courses/python-unicode/#team |
| Recommended Tutorial | https://realpython.com/python-encodings-guide/ |
| Course Slides (.pdf) | https://realpython.com/courses/python-unicode/downloads/unicode-slides/ |
| Sample Code (.zip) | https://realpython.com/courses/python-unicode/downloads/unicode-sample-code/ |
| Ask a Question | https://realpython.com/lessons/python-unicode-summary/#discussion |
| https://realpython.com/feedback/survey/course/python-unicode/liked/?from=lesson-title |
| https://realpython.com/feedback/survey/course/python-unicode/disliked/?from=lesson-title |
| Contents | https://realpython.com/lessons/python-unicode-summary/#description |
| Transcript | https://realpython.com/lessons/python-unicode-summary/#transcript |
| Discussion (6) | https://realpython.com/lessons/python-unicode-summary/#discussion |
| Unlock This Lesson | https://realpython.com/account/join/?utm_source=rp_lesson_preview&utm_content=python-unicode |
| Sign-In | https://realpython.com/account/login/ |
| Unlock This Lesson | https://realpython.com/account/join/?utm_source=rp_lesson_preview&utm_content=python-unicode |
| Sign-In | https://realpython.com/account/login/ |
| 00:00 | https://realpython.com/lessons/python-unicode-summary/#t=0.48 |
| Well, you’ve made it through eight lessons on Unicode. | https://realpython.com/lessons/python-unicode-summary/#t=0.48 |
| You’ll recall that I started off with the basics of encoding, | https://realpython.com/lessons/python-unicode-summary/#t=3.33 |
| talked about the Python string module and the constants that are available to | https://realpython.com/lessons/python-unicode-summary/#t=6.63 |
| manipulate ASCII, | https://realpython.com/lessons/python-unicode-summary/#t=10.29 |
| took a detour down Computer Science Lane and talked about bits and bytes and how | https://realpython.com/lessons/python-unicode-summary/#t=11.94 |
| they can be represented in oct and hex. | https://realpython.com/lessons/python-unicode-summary/#t=16.71 |
| 00:19 | https://realpython.com/lessons/python-unicode-summary/#t=19.41 |
| And no Unicode course would be complete without a section on Unicode. | https://realpython.com/lessons/python-unicode-summary/#t=19.41 |
| Lesson 5 talked about how UTF-8 actually is represented in binary. | https://realpython.com/lessons/python-unicode-summary/#t=23.55 |
| Lesson 6 looked at digraphs and ligatures and other kinds of combined | https://realpython.com/lessons/python-unicode-summary/#t=28.53 |
| characters. | https://realpython.com/lessons/python-unicode-summary/#t=32.4 |
| 00:33 | https://realpython.com/lessons/python-unicode-summary/#t=33.81 |
| Lesson 7 gave a tour of built-in Python functions that are helpful when | https://realpython.com/lessons/python-unicode-summary/#t=33.81 |
| dealing with Unicode or byte conversion. | https://realpython.com/lessons/python-unicode-summary/#t=38.04 |
| And the last lesson was on encodings besides UTF-8. | https://realpython.com/lessons/python-unicode-summary/#t=41.04 |
| 00:45 | https://realpython.com/lessons/python-unicode-summary/#t=45.42 |
| In this lesson, | https://realpython.com/lessons/python-unicode-summary/#t=45.42 |
| I’m going to talk about a couple of remaining corner cases and point you at some | https://realpython.com/lessons/python-unicode-summary/#t=46.05 |
| references and possible future reading material. | https://realpython.com/lessons/python-unicode-summary/#t=49.26 |
| It’s important to remember that all input is bytes until it’s decoded. | https://realpython.com/lessons/python-unicode-summary/#t=52.95 |
| 00:57 | https://realpython.com/lessons/python-unicode-summary/#t=57.39 |
| If you assume a data’s encoding, you may run into trouble. | https://realpython.com/lessons/python-unicode-summary/#t=57.39 |
| Let’s say you were accessing a recipe site API, | https://realpython.com/lessons/python-unicode-summary/#t=61.08 |
| and you got the following chunk of data. | https://realpython.com/lessons/python-unicode-summary/#t=63.96 |
| 01:09 | https://realpython.com/lessons/python-unicode-summary/#t=69.81 |
| If you make an assumption about the decoding… | https://realpython.com/lessons/python-unicode-summary/#t=69.81 |
| 01:15 | https://realpython.com/lessons/python-unicode-summary/#t=75.03 |
| you could be in trouble. Hex bc is not valid UTF-8. | https://realpython.com/lessons/python-unicode-summary/#t=75.03 |
| 01:23 | https://realpython.com/lessons/python-unicode-summary/#t=83.91 |
| Change the encoding to Latin-1, | https://realpython.com/lessons/python-unicode-summary/#t=83.91 |
| and all of a sudden the data makes an awful lot more sense. | https://realpython.com/lessons/python-unicode-summary/#t=85.68 |
| 01:31 | https://realpython.com/lessons/python-unicode-summary/#t=91.59 |
| The symbol for one quarter in UTF-8 isn’t bc, | https://realpython.com/lessons/python-unicode-summary/#t=91.59 |
| but c2 bc. There are worse cases than getting an exception. | https://realpython.com/lessons/python-unicode-summary/#t=94.98 |
| At least when you get an exception, you know something went wrong. | https://realpython.com/lessons/python-unicode-summary/#t=99.78 |
| 01:43 | https://realpython.com/lessons/python-unicode-summary/#t=103.2 |
| Consider the following piece of Norse. Encoding it… | https://realpython.com/lessons/python-unicode-summary/#t=103.2 |
| 01:51 | https://realpython.com/lessons/python-unicode-summary/#t=111.54 |
| and then decoding it in UTF-16 by accident, results in a different character. | https://realpython.com/lessons/python-unicode-summary/#t=111.54 |
| No error, no exception. | https://realpython.com/lessons/python-unicode-summary/#t=117.57 |
| Your data is now dirty and wherever you put it, it’ll be wrong. | https://realpython.com/lessons/python-unicode-summary/#t=119.7 |
| 02:05 | https://realpython.com/lessons/python-unicode-summary/#t=125.7 |
| A Python-specific problem is the open() command. | https://realpython.com/lessons/python-unicode-summary/#t=125.7 |
| open() specifies encoding, but it defaults, | https://realpython.com/lessons/python-unicode-summary/#t=129.18 |
| and the default is platform-specific. If you’re opening a text file, | https://realpython.com/lessons/python-unicode-summary/#t=132.48 |
| i.e. not specifying a binary mode and you don’t explicitly name the encoding, | https://realpython.com/lessons/python-unicode-summary/#t=137.28 |
| you will get the operating system’s encoding. | https://realpython.com/lessons/python-unicode-summary/#t=142.08 |
| 02:26 | https://realpython.com/lessons/python-unicode-summary/#t=146.16 |
| On a Mac, that’s UTF-8. On older versions of Windows, | https://realpython.com/lessons/python-unicode-summary/#t=146.16 |
| it was cp1252. On more recent ones, it might be UTF-16. | https://realpython.com/lessons/python-unicode-summary/#t=150.24 |
| You can see what the default encoding is by looking at the get_preferred_encoding() | https://realpython.com/lessons/python-unicode-summary/#t=155.46 |
| method of the locale module. | https://realpython.com/lessons/python-unicode-summary/#t=158.5 |
| 02:42 | https://realpython.com/lessons/python-unicode-summary/#t=162.87 |
| Python ships with a module that represents the Unicode database. | https://realpython.com/lessons/python-unicode-summary/#t=162.87 |
| It’s called unicodedata. | https://realpython.com/lessons/python-unicode-summary/#t=166.23 |
| You can use this to do lookups on your characters or on your code points. | https://realpython.com/lessons/python-unicode-summary/#t=167.97 |
| 02:53 | https://realpython.com/lessons/python-unicode-summary/#t=173.34 |
| Let’s look at it in action. | https://realpython.com/lessons/python-unicode-summary/#t=173.34 |
| 02:58 | https://realpython.com/lessons/python-unicode-summary/#t=178.92 |
| The name() method takes a str of a single character and returns the Unicode name for | https://realpython.com/lessons/python-unicode-summary/#t=178.92 |
| that character. | https://realpython.com/lessons/python-unicode-summary/#t=184.27 |
| 03:10 | https://realpython.com/lessons/python-unicode-summary/#t=190.78 |
| The lookup() method does the opposite. Given the name 'EURO SIGN', | https://realpython.com/lessons/python-unicode-summary/#t=190.78 |
| it returns the corresponding character. | https://realpython.com/lessons/python-unicode-summary/#t=194.47 |
| 03:20 | https://realpython.com/lessons/python-unicode-summary/#t=200.38 |
| By using name() and lookup() together you can go back and forth. | https://realpython.com/lessons/python-unicode-summary/#t=200.38 |
| Wikipedia has a ton of content on Unicode. There’s the Unicode article itself, | https://realpython.com/lessons/python-unicode-summary/#t=204.82 |
| and then there are breakdowns on Unicode character lists, the different sections | https://realpython.com/lessons/python-unicode-summary/#t=209.86 |
| of Unicode and how they’re blocked together, how to do the combinations, | https://realpython.com/lessons/python-unicode-summary/#t=214.6 |
| and then, of course, | https://realpython.com/lessons/python-unicode-summary/#t=218.38 |
| specifics to the encodings like UTF-8. In addition to Wikipedia, | https://realpython.com/lessons/python-unicode-summary/#t=218.95 |
| unicode.org itself has a rich amount of material and examples that you can pull | https://realpython.com/lessons/python-unicode-summary/#t=224.17 |
| from. If you’re looking for other encodings—back to Wikipedia. | https://realpython.com/lessons/python-unicode-summary/#t=229.12 |
| 03:53 | https://realpython.com/lessons/python-unicode-summary/#t=233.5 |
| There’s plenty there on ASCII, extended ASCII, | https://realpython.com/lessons/python-unicode-summary/#t=233.5 |
| Latin-1, and Windows-1252. | https://realpython.com/lessons/python-unicode-summary/#t=236.32 |
| If my babbling about digraph and ligatures was interesting to you, | https://realpython.com/lessons/python-unicode-summary/#t=239.26 |
| Wikipedia has got even more information there as well. | https://realpython.com/lessons/python-unicode-summary/#t=243.46 |
| 04:07 | https://realpython.com/lessons/python-unicode-summary/#t=247.12 |
| Joel on Software is a great source for programmers and his blog entry on the minimum | https://realpython.com/lessons/python-unicode-summary/#t=247.12 |
| you need to know for Unicode is quite in-depth. | https://realpython.com/lessons/python-unicode-summary/#t=251.71 |
| Additionally, David Zentgraf’s article and the Mozilla article on detecting | https://realpython.com/lessons/python-unicode-summary/#t=254.71 |
| encodings also cover lots of useful information. Specific to Python, | https://realpython.com/lessons/python-unicode-summary/#t=259.54 |
| you can look at the What’s New in Python 3.0 | https://realpython.com/lessons/python-unicode-summary/#t=264.55 |
| article that talks about how texts and bytes has changed, | https://realpython.com/lessons/python-unicode-summary/#t=266.38 |
| and the default Unicode mechanisms in Python 3. | https://realpython.com/lessons/python-unicode-summary/#t=270.01 |
| 04:33 | https://realpython.com/lessons/python-unicode-summary/#t=273.28 |
| Understanding Unicode is so necessary that Python has a full how-to on it, and | https://realpython.com/lessons/python-unicode-summary/#t=273.28 |
| deep within the documentation, | https://realpython.com/lessons/python-unicode-summary/#t=277.9 |
| you can find a full listing of the supported encodings. Given the topic, | https://realpython.com/lessons/python-unicode-summary/#t=279.46 |
| it seems only appropriate to say merci, grazie, gracias. | https://realpython.com/lessons/python-unicode-summary/#t=284.62 |
| 04:49 | https://realpython.com/lessons/python-unicode-summary/#t=289.15 |
| Thanks for your attention. I hope it’s been informative. | https://realpython.com/lessons/python-unicode-summary/#t=289.15 |
| July 2, 2020 | https://realpython.com/lessons/python-unicode-summary/#comment-11e79fb7-7bf6-4741-96eb-8da2135ac077 |
| July 6, 2020 | https://realpython.com/lessons/python-unicode-summary/#comment-57e89305-bb68-4849-a95f-d42575db90aa |
| Aug. 21, 2020 | https://realpython.com/lessons/python-unicode-summary/#comment-77e7a73b-14de-4b2a-b1dc-31e4a7e447cb |
| Oct. 10, 2020 | https://realpython.com/lessons/python-unicode-summary/#comment-f8a708ac-9ea9-400f-ab4f-2bf2406e3a83 |
| Jan. 24, 2021 | https://realpython.com/lessons/python-unicode-summary/#comment-2e06b507-86c5-4efd-a76c-1493d121cfc8 |
| Jan. 24, 2021 | https://realpython.com/lessons/python-unicode-summary/#comment-ae4b714c-6d9a-43f5-b35b-1235a48b972a |
| Become a Member | https://realpython.com/account/join/ |
| https://realpython.com/lessons/other-encodings/ |
| Overview | https://realpython.com/courses/python-unicode/ |
|
Unicode in Python: Working With Character Encodings (Overview) 07:56
| https://realpython.com/videos/python-unicode-overview/ |
|
Working With ASCII and the Python String Module 05:49
| https://realpython.com/videos/ascii-python-string-module/ |
|
Working in Binary: Bits, Bytes, Oct, and Hex 06:26
| https://realpython.com/lessons/bits-bytes-oct-hex/ |
|
Using Unicode 04:15
| https://realpython.com/lessons/using-unicode/ |
|
Encoding UTF-8 06:19
| https://realpython.com/lessons/encoding-utf8/ |
|
Combining Characters 05:40
| https://realpython.com/lessons/combining-characters/ |
|
Using Built-In Functions 05:38
| https://realpython.com/lessons/built-in-functions/ |
|
Using Other Encodings 04:45
| https://realpython.com/lessons/other-encodings/ |
|
Unicode in Python: Working With Character Encodings (Summary) 04:53
| https://realpython.com/lessons/python-unicode-summary/ |
| Privacy Policy | https://realpython.com/privacy-policy/ |
Viewport: width=device-width, initial-scale=1, shrink-to-fit=no, viewport-fit=cover