René's URL Explorer Experiment


Title: Unicode in Python: Working With Character Encodings (Summary) (Video) – Real Python

Open Graph Title: Unicode in Python: Working With Character Encodings (Summary) – Real Python

Description: Well, you’ve made it through eight lessons on Unicode. You’ll recall that I started off with the basics of encoding, talked about the Python string module and the constants that are available to manipulate ASCII, took a detour down Computer Science…

Open Graph Description: Well, you’ve made it through eight lessons on Unicode. You’ll recall that I started off with the basics of encoding, talked about the Python string module and the constants that are available to manipulate ASCII, took a detour down Computer Science…

Opengraph URL: https://realpython.com/lessons/python-unicode-summary/

X: @realpython

direct link

Domain: realpython.com


Hey, it has json ld scripts:
  {
    "@context": "https://schema.org",
    "@type": "VideoObject",
    "name": "Unicode in Python: Working With Character Encodings (Summary)",
    "description": "Well, you’ve made it through eight lessons on Unicode. You’ll recall that I started off with the basics of encoding, talked about the Python string module and the constants that are available to manipulate ASCII, took a detour down Computer Science…",
    "thumbnailUrl": ["https://files.realpython.com/media/Encodings--Number-Systems_Watermarked.906d62e907dc.jpg"],
    "uploadDate": "2020-06-30T14:00:00+00:00",
    "duration": "PT4M53S",
    
    "potentialAction": {
      "@type": "SeekToAction",
      "target": "https://realpython.com/lessons/python-unicode-summary/#t={seek_to_second_number}",
      "startOffset-input": "required name=seek_to_second_number"
    }
  }
  

authorReal Python
twitter:cardsummary_large_image
twitter:imagehttps://files.realpython.com/media/Encodings--Number-Systems_Watermarked.906d62e907dc.jpg
og:imagehttps://files.realpython.com/media/Encodings--Number-Systems_Watermarked.906d62e907dc.jpg
twitter:creator@realpython
og:typevideo.episode

Links:

https://realpython.com/
Start Herehttps://realpython.com/start-here/
Learn Python https://realpython.com/lessons/python-unicode-summary/
Python Tutorials →In-depth articles and video courseshttps://realpython.com/search?kind=article&kind=course&order=newest
Learning Paths →Guided study plans for accelerated learninghttps://realpython.com/learning-paths/
Quizzes & Exercises →Check your learning progresshttps://realpython.com/quizzes/
Browse Topics →Focus on a specific area or skill levelhttps://realpython.com/tutorials/all/
Community Chat →Learn with other Pythonistashttps://realpython.com/community/
Office Hours →Live Q&A calls with Python expertshttps://realpython.com/office-hours/
Podcast →Hear what’s new in the world of Pythonhttps://realpython.com/podcasts/rpp/
Books →Round out your knowledge and learn offlinehttps://realpython.com/products/books/
Reference →Concise definitions for common Python termshttps://realpython.com/ref/
Code Mentor →BetaPersonalized code assistance & learning toolshttps://realpython.com/mentor/
Unlock All Content →https://realpython.com/account/join/
More https://realpython.com/lessons/python-unicode-summary/
Learner Storieshttps://realpython.com/learner-stories/
Python Newsletterhttps://realpython.com/newsletter/
Python Job Boardhttps://www.pythonjobshq.com
Meet the Teamhttps://realpython.com/team/
Become a Tutorial Writerhttps://realpython.com/write-for-us/
Become a Video Instructorhttps://realpython.com/become-an-instructor/
Searchhttps://realpython.com/search
https://realpython.com/search
Joinhttps://realpython.com/account/join/
Sign‑Inhttps://realpython.com/account/login/?next=%2Flessons%2Fpython-unicode-summary%2F
Unlock This Lessonhttps://realpython.com/account/join/?utm_source=rp_lesson&utm_content=python-unicode
Unlock This Lessonhttps://realpython.com/account/join/?utm_source=rp_lesson&utm_content=python-unicode
https://realpython.com/courses/python-unicode/#team
Unicode in Python: Working With Character Encodingshttps://realpython.com/courses/python-unicode/
Christopher Trudeauhttps://realpython.com/courses/python-unicode/#team
Recommended Tutorialhttps://realpython.com/python-encodings-guide/
Course Slides (.pdf)https://realpython.com/courses/python-unicode/downloads/unicode-slides/
Sample Code (.zip)https://realpython.com/courses/python-unicode/downloads/unicode-sample-code/
Ask a Questionhttps://realpython.com/lessons/python-unicode-summary/#discussion
https://realpython.com/feedback/survey/course/python-unicode/liked/?from=lesson-title
https://realpython.com/feedback/survey/course/python-unicode/disliked/?from=lesson-title
Contentshttps://realpython.com/lessons/python-unicode-summary/#description
Transcripthttps://realpython.com/lessons/python-unicode-summary/#transcript
Discussion (6)https://realpython.com/lessons/python-unicode-summary/#discussion
Unlock This Lessonhttps://realpython.com/account/join/?utm_source=rp_lesson_preview&utm_content=python-unicode
Sign-Inhttps://realpython.com/account/login/
Unlock This Lessonhttps://realpython.com/account/join/?utm_source=rp_lesson_preview&utm_content=python-unicode
Sign-Inhttps://realpython.com/account/login/
00:00https://realpython.com/lessons/python-unicode-summary/#t=0.48
Well, you’ve made it through eight lessons on Unicode.https://realpython.com/lessons/python-unicode-summary/#t=0.48
You’ll recall that I started off with the basics of encoding,https://realpython.com/lessons/python-unicode-summary/#t=3.33
talked about the Python string module and the constants that are available tohttps://realpython.com/lessons/python-unicode-summary/#t=6.63
manipulate ASCII,https://realpython.com/lessons/python-unicode-summary/#t=10.29
took a detour down Computer Science Lane and talked about bits and bytes and howhttps://realpython.com/lessons/python-unicode-summary/#t=11.94
they can be represented in oct and hex.https://realpython.com/lessons/python-unicode-summary/#t=16.71
00:19https://realpython.com/lessons/python-unicode-summary/#t=19.41
And no Unicode course would be complete without a section on Unicode.https://realpython.com/lessons/python-unicode-summary/#t=19.41
Lesson 5 talked about how UTF-8 actually is represented in binary.https://realpython.com/lessons/python-unicode-summary/#t=23.55
Lesson 6 looked at digraphs and ligatures and other kinds of combinedhttps://realpython.com/lessons/python-unicode-summary/#t=28.53
characters.https://realpython.com/lessons/python-unicode-summary/#t=32.4
00:33https://realpython.com/lessons/python-unicode-summary/#t=33.81
Lesson 7 gave a tour of built-in Python functions that are helpful whenhttps://realpython.com/lessons/python-unicode-summary/#t=33.81
dealing with Unicode or byte conversion.https://realpython.com/lessons/python-unicode-summary/#t=38.04
And the last lesson was on encodings besides UTF-8.https://realpython.com/lessons/python-unicode-summary/#t=41.04
00:45https://realpython.com/lessons/python-unicode-summary/#t=45.42
In this lesson,https://realpython.com/lessons/python-unicode-summary/#t=45.42
I’m going to talk about a couple of remaining corner cases and point you at somehttps://realpython.com/lessons/python-unicode-summary/#t=46.05
references and possible future reading material.https://realpython.com/lessons/python-unicode-summary/#t=49.26
It’s important to remember that all input is bytes until it’s decoded.https://realpython.com/lessons/python-unicode-summary/#t=52.95
00:57https://realpython.com/lessons/python-unicode-summary/#t=57.39
If you assume a data’s encoding, you may run into trouble.https://realpython.com/lessons/python-unicode-summary/#t=57.39
Let’s say you were accessing a recipe site API,https://realpython.com/lessons/python-unicode-summary/#t=61.08
and you got the following chunk of data.https://realpython.com/lessons/python-unicode-summary/#t=63.96
01:09https://realpython.com/lessons/python-unicode-summary/#t=69.81
If you make an assumption about the decoding…https://realpython.com/lessons/python-unicode-summary/#t=69.81
01:15https://realpython.com/lessons/python-unicode-summary/#t=75.03
you could be in trouble. Hex bc is not valid UTF-8.https://realpython.com/lessons/python-unicode-summary/#t=75.03
01:23https://realpython.com/lessons/python-unicode-summary/#t=83.91
Change the encoding to Latin-1,https://realpython.com/lessons/python-unicode-summary/#t=83.91
and all of a sudden the data makes an awful lot more sense.https://realpython.com/lessons/python-unicode-summary/#t=85.68
01:31https://realpython.com/lessons/python-unicode-summary/#t=91.59
The symbol for one quarter in UTF-8 isn’t bc,https://realpython.com/lessons/python-unicode-summary/#t=91.59
but c2 bc. There are worse cases than getting an exception.https://realpython.com/lessons/python-unicode-summary/#t=94.98
At least when you get an exception, you know something went wrong.https://realpython.com/lessons/python-unicode-summary/#t=99.78
01:43https://realpython.com/lessons/python-unicode-summary/#t=103.2
Consider the following piece of Norse. Encoding it…https://realpython.com/lessons/python-unicode-summary/#t=103.2
01:51https://realpython.com/lessons/python-unicode-summary/#t=111.54
and then decoding it in UTF-16 by accident, results in a different character.https://realpython.com/lessons/python-unicode-summary/#t=111.54
No error, no exception.https://realpython.com/lessons/python-unicode-summary/#t=117.57
Your data is now dirty and wherever you put it, it’ll be wrong.https://realpython.com/lessons/python-unicode-summary/#t=119.7
02:05https://realpython.com/lessons/python-unicode-summary/#t=125.7
A Python-specific problem is the open() command.https://realpython.com/lessons/python-unicode-summary/#t=125.7
open() specifies encoding, but it defaults,https://realpython.com/lessons/python-unicode-summary/#t=129.18
and the default is platform-specific. If you’re opening a text file,https://realpython.com/lessons/python-unicode-summary/#t=132.48
i.e. not specifying a binary mode and you don’t explicitly name the encoding,https://realpython.com/lessons/python-unicode-summary/#t=137.28
you will get the operating system’s encoding.https://realpython.com/lessons/python-unicode-summary/#t=142.08
02:26https://realpython.com/lessons/python-unicode-summary/#t=146.16
On a Mac, that’s UTF-8. On older versions of Windows,https://realpython.com/lessons/python-unicode-summary/#t=146.16
it was cp1252. On more recent ones, it might be UTF-16.https://realpython.com/lessons/python-unicode-summary/#t=150.24
You can see what the default encoding is by looking at the get_preferred_encoding()https://realpython.com/lessons/python-unicode-summary/#t=155.46
method of the locale module.https://realpython.com/lessons/python-unicode-summary/#t=158.5
02:42https://realpython.com/lessons/python-unicode-summary/#t=162.87
Python ships with a module that represents the Unicode database.https://realpython.com/lessons/python-unicode-summary/#t=162.87
It’s called unicodedata.https://realpython.com/lessons/python-unicode-summary/#t=166.23
You can use this to do lookups on your characters or on your code points.https://realpython.com/lessons/python-unicode-summary/#t=167.97
02:53https://realpython.com/lessons/python-unicode-summary/#t=173.34
Let’s look at it in action.https://realpython.com/lessons/python-unicode-summary/#t=173.34
02:58https://realpython.com/lessons/python-unicode-summary/#t=178.92
The name() method takes a str of a single character and returns the Unicode name forhttps://realpython.com/lessons/python-unicode-summary/#t=178.92
that character.https://realpython.com/lessons/python-unicode-summary/#t=184.27
03:10https://realpython.com/lessons/python-unicode-summary/#t=190.78
The lookup() method does the opposite. Given the name 'EURO SIGN',https://realpython.com/lessons/python-unicode-summary/#t=190.78
it returns the corresponding character.https://realpython.com/lessons/python-unicode-summary/#t=194.47
03:20https://realpython.com/lessons/python-unicode-summary/#t=200.38
By using name() and lookup() together you can go back and forth.https://realpython.com/lessons/python-unicode-summary/#t=200.38
Wikipedia has a ton of content on Unicode. There’s the Unicode article itself,https://realpython.com/lessons/python-unicode-summary/#t=204.82
and then there are breakdowns on Unicode character lists, the different sectionshttps://realpython.com/lessons/python-unicode-summary/#t=209.86
of Unicode and how they’re blocked together, how to do the combinations,https://realpython.com/lessons/python-unicode-summary/#t=214.6
and then, of course,https://realpython.com/lessons/python-unicode-summary/#t=218.38
specifics to the encodings like UTF-8. In addition to Wikipedia,https://realpython.com/lessons/python-unicode-summary/#t=218.95
unicode.org itself has a rich amount of material and examples that you can pullhttps://realpython.com/lessons/python-unicode-summary/#t=224.17
from. If you’re looking for other encodings—back to Wikipedia.https://realpython.com/lessons/python-unicode-summary/#t=229.12
03:53https://realpython.com/lessons/python-unicode-summary/#t=233.5
There’s plenty there on ASCII, extended ASCII,https://realpython.com/lessons/python-unicode-summary/#t=233.5
Latin-1, and Windows-1252.https://realpython.com/lessons/python-unicode-summary/#t=236.32
If my babbling about digraph and ligatures was interesting to you,https://realpython.com/lessons/python-unicode-summary/#t=239.26
Wikipedia has got even more information there as well.https://realpython.com/lessons/python-unicode-summary/#t=243.46
04:07https://realpython.com/lessons/python-unicode-summary/#t=247.12
Joel on Software is a great source for programmers and his blog entry on the minimumhttps://realpython.com/lessons/python-unicode-summary/#t=247.12
you need to know for Unicode is quite in-depth.https://realpython.com/lessons/python-unicode-summary/#t=251.71
Additionally, David Zentgraf’s article and the Mozilla article on detectinghttps://realpython.com/lessons/python-unicode-summary/#t=254.71
encodings also cover lots of useful information. Specific to Python,https://realpython.com/lessons/python-unicode-summary/#t=259.54
you can look at the What’s New in Python 3.0https://realpython.com/lessons/python-unicode-summary/#t=264.55
article that talks about how texts and bytes has changed,https://realpython.com/lessons/python-unicode-summary/#t=266.38
and the default Unicode mechanisms in Python 3.https://realpython.com/lessons/python-unicode-summary/#t=270.01
04:33https://realpython.com/lessons/python-unicode-summary/#t=273.28
Understanding Unicode is so necessary that Python has a full how-to on it, andhttps://realpython.com/lessons/python-unicode-summary/#t=273.28
deep within the documentation,https://realpython.com/lessons/python-unicode-summary/#t=277.9
you can find a full listing of the supported encodings. Given the topic,https://realpython.com/lessons/python-unicode-summary/#t=279.46
it seems only appropriate to say merci, grazie, gracias.https://realpython.com/lessons/python-unicode-summary/#t=284.62
04:49https://realpython.com/lessons/python-unicode-summary/#t=289.15
Thanks for your attention. I hope it’s been informative.https://realpython.com/lessons/python-unicode-summary/#t=289.15
July 2, 2020https://realpython.com/lessons/python-unicode-summary/#comment-11e79fb7-7bf6-4741-96eb-8da2135ac077
July 6, 2020https://realpython.com/lessons/python-unicode-summary/#comment-57e89305-bb68-4849-a95f-d42575db90aa
Aug. 21, 2020https://realpython.com/lessons/python-unicode-summary/#comment-77e7a73b-14de-4b2a-b1dc-31e4a7e447cb
Oct. 10, 2020https://realpython.com/lessons/python-unicode-summary/#comment-f8a708ac-9ea9-400f-ab4f-2bf2406e3a83
Jan. 24, 2021https://realpython.com/lessons/python-unicode-summary/#comment-2e06b507-86c5-4efd-a76c-1493d121cfc8
Jan. 24, 2021https://realpython.com/lessons/python-unicode-summary/#comment-ae4b714c-6d9a-43f5-b35b-1235a48b972a
Become a Memberhttps://realpython.com/account/join/
https://realpython.com/lessons/other-encodings/
Overviewhttps://realpython.com/courses/python-unicode/
Unicode in Python: Working With Character Encodings (Overview) 07:56 https://realpython.com/videos/python-unicode-overview/
Working With ASCII and the Python String Module 05:49 https://realpython.com/videos/ascii-python-string-module/
Working in Binary: Bits, Bytes, Oct, and Hex 06:26 https://realpython.com/lessons/bits-bytes-oct-hex/
Using Unicode 04:15 https://realpython.com/lessons/using-unicode/
Encoding UTF-8 06:19 https://realpython.com/lessons/encoding-utf8/
Combining Characters 05:40 https://realpython.com/lessons/combining-characters/
Using Built-In Functions 05:38 https://realpython.com/lessons/built-in-functions/
Using Other Encodings 04:45 https://realpython.com/lessons/other-encodings/
Unicode in Python: Working With Character Encodings (Summary) 04:53 https://realpython.com/lessons/python-unicode-summary/
Privacy Policyhttps://realpython.com/privacy-policy/

Viewport: width=device-width, initial-scale=1, shrink-to-fit=no, viewport-fit=cover

Robots: max-image-preview:large


URLs of crawlers that visited me.