René's URL Explorer Experiment


Title: Unicode in Python: Working With Character Encodings (Overview) (Video) – Real Python

Open Graph Title: Unicode in Python: Working With Character Encodings (Overview) – Real Python

Description: Welcome to Unicode and Character Encodings in Python. My name is Chris and I will be your guide. This course talks about what an encoding is and how it works, where ASCII came from and how it evolved, how binary bits can be described in oct and hex…

Open Graph Description: Welcome to Unicode and Character Encodings in Python. My name is Chris and I will be your guide. This course talks about what an encoding is and how it works, where ASCII came from and how it evolved, how binary bits can be described in oct and hex…

Opengraph URL: https://realpython.com/videos/python-unicode-overview/

X: @realpython

direct link

Domain: realpython.com


Hey, it has json ld scripts:
  {
    "@context": "https://schema.org",
    "@type": "VideoObject",
    "name": "Unicode in Python: Working With Character Encodings (Overview)",
    "description": "Welcome to Unicode and Character Encodings in Python. My name is Chris and I will be your guide. This course talks about what an encoding is and how it works, where ASCII came from and how it evolved, how binary bits can be described in oct and hex…",
    "thumbnailUrl": ["https://files.realpython.com/media/Encodings--Number-Systems_Watermarked.906d62e907dc.jpg"],
    "uploadDate": "2020-06-30T14:00:00+00:00",
    "duration": "PT7M56S",
    "embedUrl": "https://player.vimeo.com/video/431623973",
    "potentialAction": {
      "@type": "SeekToAction",
      "target": "https://realpython.com/videos/python-unicode-overview/#t={seek_to_second_number}",
      "startOffset-input": "required name=seek_to_second_number"
    }
  }
  

authorReal Python
twitter:cardsummary_large_image
twitter:imagehttps://files.realpython.com/media/Encodings--Number-Systems_Watermarked.906d62e907dc.jpg
og:imagehttps://files.realpython.com/media/Encodings--Number-Systems_Watermarked.906d62e907dc.jpg
twitter:creator@realpython
og:typevideo.episode

Links:

https://realpython.com/
Start Herehttps://realpython.com/start-here/
Learn Python https://realpython.com/videos/python-unicode-overview/
Python Tutorials →In-depth articles and video courseshttps://realpython.com/search?kind=article&kind=course&order=newest
Learning Paths →Guided study plans for accelerated learninghttps://realpython.com/learning-paths/
Quizzes & Exercises →Check your learning progresshttps://realpython.com/quizzes/
Browse Topics →Focus on a specific area or skill levelhttps://realpython.com/tutorials/all/
Community Chat →Learn with other Pythonistashttps://realpython.com/community/
Office Hours →Live Q&A calls with Python expertshttps://realpython.com/office-hours/
Podcast →Hear what’s new in the world of Pythonhttps://realpython.com/podcasts/rpp/
Books →Round out your knowledge and learn offlinehttps://realpython.com/products/books/
Reference →Concise definitions for common Python termshttps://realpython.com/ref/
Code Mentor →BetaPersonalized code assistance & learning toolshttps://realpython.com/mentor/
Unlock All Content →https://realpython.com/account/join/
More https://realpython.com/videos/python-unicode-overview/
Learner Storieshttps://realpython.com/learner-stories/
Python Newsletterhttps://realpython.com/newsletter/
Python Job Boardhttps://www.pythonjobshq.com
Meet the Teamhttps://realpython.com/team/
Become a Tutorial Writerhttps://realpython.com/write-for-us/
Become a Video Instructorhttps://realpython.com/become-an-instructor/
Searchhttps://realpython.com/search
https://realpython.com/search
Joinhttps://realpython.com/account/join/
Sign‑Inhttps://realpython.com/account/login/?next=%2Fvideos%2Fpython-unicode-overview%2F
https://realpython.com/courses/python-unicode/#team
Unicode in Python: Working With Character Encodingshttps://realpython.com/courses/python-unicode/
Christopher Trudeauhttps://realpython.com/courses/python-unicode/#team
Recommended Tutorialhttps://realpython.com/python-encodings-guide/
Course Slides (.pdf)https://realpython.com/courses/python-unicode/downloads/unicode-slides/
Sample Code (.zip)https://realpython.com/courses/python-unicode/downloads/unicode-sample-code/
Ask a Questionhttps://realpython.com/videos/python-unicode-overview/#discussion
https://realpython.com/feedback/survey/course/python-unicode/liked/?from=lesson-title
https://realpython.com/feedback/survey/course/python-unicode/disliked/?from=lesson-title
Contentshttps://realpython.com/videos/python-unicode-overview/#description
Transcripthttps://realpython.com/videos/python-unicode-overview/#transcript
Discussionhttps://realpython.com/videos/python-unicode-overview/#discussion
Sample Code (.zip)https://realpython.com/courses/python-unicode/downloads/unicode-sample-code/
Course Slides (.pdf)https://realpython.com/courses/python-unicode/downloads/unicode-slides/
00:00https://realpython.com/videos/python-unicode-overview/#t=0.48
Welcome to Unicode and Character Encodings in Python.https://realpython.com/videos/python-unicode-overview/#t=0.48
My name is Chris and I will be your guide.https://realpython.com/videos/python-unicode-overview/#t=3.36
This course talks about what an encoding is and how it works,https://realpython.com/videos/python-unicode-overview/#t=6.09
where ASCII came from and how it evolved,https://realpython.com/videos/python-unicode-overview/#t=9.78
how binary bits can be described in oct and hex and how to use those to maphttps://realpython.com/videos/python-unicode-overview/#t=13.14
to code points, the Unicode standard and the UTF-8 encoding thereof,https://realpython.com/videos/python-unicode-overview/#t=16.89
how UTF-8 uses the underlying bits to encode a code point,https://realpython.com/videos/python-unicode-overview/#t=22.14
how multiple code points can result in a single character or glyph, functionshttps://realpython.com/videos/python-unicode-overview/#t=26.28
built into Python that can help you when you’re messing around with charactershttps://realpython.com/videos/python-unicode-overview/#t=30.6
in Unicode, and other encodings. First off, strings and character encoding ishttps://realpython.com/videos/python-unicode-overview/#t=34.08
one of the big changes between Python 2 in Python 3. In fact,https://realpython.com/videos/python-unicode-overview/#t=39.24
it’s one of the better reasons to move from Python 2 to Python 3.https://realpython.com/videos/python-unicode-overview/#t=42.54
00:46https://realpython.com/videos/python-unicode-overview/#t=46.23
All the examples in this course will be Python 3 based. If you’re using ahttps://realpython.com/videos/python-unicode-overview/#t=46.23
Python 2 interpreter, you’re not going to be able to follow along.https://realpython.com/videos/python-unicode-overview/#t=50.04
It’s really easy to forget when you’re programming in a nice high-level languagehttps://realpython.com/videos/python-unicode-overview/#t=53.61
like Python that computers really only understand numbers.https://realpython.com/videos/python-unicode-overview/#t=56.82
01:00https://realpython.com/videos/python-unicode-overview/#t=60.57
When you’re dealing with text,https://realpython.com/videos/python-unicode-overview/#t=60.57
you’re actually dealing with a mapping between a number and a character that ishttps://realpython.com/videos/python-unicode-overview/#t=61.89
being displayed.https://realpython.com/videos/python-unicode-overview/#t=66.06
The fundamental item that is being stored in memory is still a number. ASCII washttps://realpython.com/videos/python-unicode-overview/#t=67.59
one of the preeminent standards for this kind of mapping.https://realpython.com/videos/python-unicode-overview/#t=72.9
01:16https://realpython.com/videos/python-unicode-overview/#t=76.17
It specified that certain numbers represented certain letters,https://realpython.com/videos/python-unicode-overview/#t=76.17
and so when the computer used those numbers in the context of a stringhttps://realpython.com/videos/python-unicode-overview/#t=79.8
it would produce the right letters.https://realpython.com/videos/python-unicode-overview/#t=83.91
01:26https://realpython.com/videos/python-unicode-overview/#t=86.19
The problem with ASCII was it really only encoded the Latin alphabet.https://realpython.com/videos/python-unicode-overview/#t=86.19
It didn’t even include accented characters.https://realpython.com/videos/python-unicode-overview/#t=89.13
It was invented by and for English speakers;https://realpython.com/videos/python-unicode-overview/#t=91.5
it wasn’t until later that accents for other Western languages were added. Byhttps://realpython.com/videos/python-unicode-overview/#t=93.99
contrast,https://realpython.com/videos/python-unicode-overview/#t=98.64
Unicode is an international standard and has enough space to encode all writtenhttps://realpython.com/videos/python-unicode-overview/#t=99.3
languages. In fact, it has space to encode other things as well,https://realpython.com/videos/python-unicode-overview/#t=103.62
like emojis. At one point in time, there was even a move to add Klingon to it,https://realpython.com/videos/python-unicode-overview/#t=107.88
but it was turned down. But there’s still space left overhttps://realpython.com/videos/python-unicode-overview/#t=112.11
if the standard body changes its mind. First off, a little history.https://realpython.com/videos/python-unicode-overview/#t=115.11
02:00https://realpython.com/videos/python-unicode-overview/#t=120.03
I think I mentioned that computers only understand numbers? Well,https://realpython.com/videos/python-unicode-overview/#t=120.03
computers only understand numbers. In fact, it’s even worse than that—https://realpython.com/videos/python-unicode-overview/#t=123.03
they really only understand binary. Everything is a 1 or a 0.https://realpython.com/videos/python-unicode-overview/#t=126.6
02:10https://realpython.com/videos/python-unicode-overview/#t=130.8
This goes down to how transistors work—they’re either on or off.https://realpython.com/videos/python-unicode-overview/#t=130.8
So, inside of the computer, everything is represented as either True orhttps://realpython.com/videos/python-unicode-overview/#t=134.28
False, on or off, or 1 or 0 to represent that. Everything on top of that is anhttps://realpython.com/videos/python-unicode-overview/#t=137.49
abstraction.https://realpython.com/videos/python-unicode-overview/#t=144.3
02:25https://realpython.com/videos/python-unicode-overview/#t=145.86
A byte is a grouping of bits. In the early history of computers,https://realpython.com/videos/python-unicode-overview/#t=145.86
the size of a byte was different from different machines.https://realpython.com/videos/python-unicode-overview/#t=149.79
By the time PCs came around,https://realpython.com/videos/python-unicode-overview/#t=152.67
there were 8 bits to a byte, and that’s pretty common now.https://realpython.com/videos/python-unicode-overview/#t=154.41
02:37https://realpython.com/videos/python-unicode-overview/#t=157.77
Now, most processors deal with more than one byte at a time,https://realpython.com/videos/python-unicode-overview/#t=157.77
but instead of redefining how big a byte is,https://realpython.com/videos/python-unicode-overview/#t=160.71
they have other terms like word for groupings of bytes.https://realpython.com/videos/python-unicode-overview/#t=162.96
02:46https://realpython.com/videos/python-unicode-overview/#t=166.47
An 8-bit byte can hold 2^8 combinations—https://realpython.com/videos/python-unicode-overview/#t=166.47
that’s 256. The counting starts at 0,https://realpython.com/videos/python-unicode-overview/#t=169.92
so the number range, instead of being from 1 to 256, is from 0 to 255. Back inhttps://realpython.com/videos/python-unicode-overview/#t=173.76
03:00https://realpython.com/videos/python-unicode-overview/#t=180.22
the olden times—and I’m talking about timehttps://realpython.com/videos/python-unicode-overview/#t=180.22
so old that even an old man like me thinks they’re the past—IBM introduced BCD,https://realpython.com/videos/python-unicode-overview/#t=182.12
or Binary Coded Decimal. This was an early encoding. It was very,https://realpython.com/videos/python-unicode-overview/#t=186.79
very simple and very small. It used 6 bits to represent a character.https://realpython.com/videos/python-unicode-overview/#t=190.6
03:15https://realpython.com/videos/python-unicode-overview/#t=195.22
This wasn’t enough to even fully cover the English language,https://realpython.com/videos/python-unicode-overview/#t=195.22
so IBM extended BCD with EBCDIC—Extended Binaryhttps://realpython.com/videos/python-unicode-overview/#t=197.8
Coded Decimal Interchange Code.https://realpython.com/videos/python-unicode-overview/#t=202.54
03:25https://realpython.com/videos/python-unicode-overview/#t=205.33
This used a full 8 bits to describe a character and was so advancedhttps://realpython.com/videos/python-unicode-overview/#t=205.33
it actually included lowercase letters. Around the same time as EBCDIC beinghttps://realpython.com/videos/python-unicode-overview/#t=209.53
standardized, ASCII was introduced.https://realpython.com/videos/python-unicode-overview/#t=214.03
03:36https://realpython.com/videos/python-unicode-overview/#t=216.55
ASCII was put together by a standards body rather than by a single company andhttps://realpython.com/videos/python-unicode-overview/#t=216.55
became more popular across different platforms.https://realpython.com/videos/python-unicode-overview/#t=220.09
ASCII only required 7 bits,https://realpython.com/videos/python-unicode-overview/#t=223.39
but at the time most computers were using an 8-bit byte,https://realpython.com/videos/python-unicode-overview/#t=225.46
so the lead bit was just left as 0. Sometimes, using some transmission protocolshttps://realpython.com/videos/python-unicode-overview/#t=228.55
like over modems or terminals, that 8th bit would be used as a parity bit tohttps://realpython.com/videos/python-unicode-overview/#t=233.8
make sure that the byte had been transmitted correctly.https://realpython.com/videos/python-unicode-overview/#t=238.21
04:01https://realpython.com/videos/python-unicode-overview/#t=241.9
ASCII was adopted as an international standard in 1967, and quicklyhttps://realpython.com/videos/python-unicode-overview/#t=241.9
there were several iterations and extensions made on top of it.https://realpython.com/videos/python-unicode-overview/#t=245.92
The extended ASCII format moved to a full 8-bits of description and addedhttps://realpython.com/videos/python-unicode-overview/#t=249.49
accent characters,https://realpython.com/videos/python-unicode-overview/#t=253.93
allowing Western languages that were not English to be described.https://realpython.com/videos/python-unicode-overview/#t=255.25
04:19https://realpython.com/videos/python-unicode-overview/#t=259.18
PCs used ASCII, so when they became the defacto standard,https://realpython.com/videos/python-unicode-overview/#t=259.18
ASCII became the way of communicating between computers. For clarity’s sake,https://realpython.com/videos/python-unicode-overview/#t=262.33
let’s establish some common terminology. First off, what’s a character?https://realpython.com/videos/python-unicode-overview/#t=266.59
04:31https://realpython.com/videos/python-unicode-overview/#t=271.24
This probably feels clear to you—it’s that one little single unit of text—https://realpython.com/videos/python-unicode-overview/#t=271.24
but this term can actually get a little confusing depending on who you’rehttps://realpython.com/videos/python-unicode-overview/#t=275.38
talking to. So for the purposes of this course,https://realpython.com/videos/python-unicode-overview/#t=278.62
the word character is going to mean a minimal unit of text that has a semantichttps://realpython.com/videos/python-unicode-overview/#t=281.47
value. So, that includes things like emojis, or symbols in Han Chinese,https://realpython.com/videos/python-unicode-overview/#t=285.64
as well as obvious stuff like the letter A.https://realpython.com/videos/python-unicode-overview/#t=290.08
04:52https://realpython.com/videos/python-unicode-overview/#t=292.66
A character set is just a collection of these characters,https://realpython.com/videos/python-unicode-overview/#t=292.66
and these sets can be used across multiple languages.https://realpython.com/videos/python-unicode-overview/#t=295.66
Think about the Latin character set that most European languages can use,https://realpython.com/videos/python-unicode-overview/#t=298.84
the Greek character set that pretty much only the Greek language can use,https://realpython.com/videos/python-unicode-overview/#t=302.95
and the Russian character set, which is used across certain Slavic languages.https://realpython.com/videos/python-unicode-overview/#t=306.34
05:10https://realpython.com/videos/python-unicode-overview/#t=310.93
A code point is a number that represents a single character in one of these setshttps://realpython.com/videos/python-unicode-overview/#t=310.93
of encoded characters. For example, in the ASCII standard,https://realpython.com/videos/python-unicode-overview/#t=315.94
the capital letter 'A' is the decimal number 65.https://realpython.com/videos/python-unicode-overview/#t=320.86
05:25https://realpython.com/videos/python-unicode-overview/#t=325.69
A code unit, by contrast to a code point, is a sequence of bits that representhttps://realpython.com/videos/python-unicode-overview/#t=325.69
that code point. In ASCII,https://realpython.com/videos/python-unicode-overview/#t=331.03
the code point 65 means 'A', and it’s stored in the computer using that number.https://realpython.com/videos/python-unicode-overview/#t=333.16
05:37https://realpython.com/videos/python-unicode-overview/#t=337.72
In other encoding standards that mapping may not apply.https://realpython.com/videos/python-unicode-overview/#t=337.72
As I mentioned before, in the original ASCII standard,https://realpython.com/videos/python-unicode-overview/#t=340.72
a code unit was 7 bits long,https://realpython.com/videos/python-unicode-overview/#t=343.87
so that covered from the numbers 0 to 127. Unicode supports different kinds ofhttps://realpython.com/videos/python-unicode-overview/#t=346.0
encodings, and some of those even have varying length code units.https://realpython.com/videos/python-unicode-overview/#t=351.52
05:55https://realpython.com/videos/python-unicode-overview/#t=355.87
UTF-8, one of those encodings, is an 8-bit encoding,https://realpython.com/videos/python-unicode-overview/#t=355.87
but its code point can map to 1, 2, 3, or 4 code units,https://realpython.com/videos/python-unicode-overview/#t=359.81
so multiple bytes may be describing a single code point.https://realpython.com/videos/python-unicode-overview/#t=364.61
06:08https://realpython.com/videos/python-unicode-overview/#t=368.96
That’s enough background.https://realpython.com/videos/python-unicode-overview/#t=368.96
Let’s look at some code. In order to inspect some strings,https://realpython.com/videos/python-unicode-overview/#t=369.98
I’ve written a quick little method inside of a file called show.py. The corehttps://realpython.com/videos/python-unicode-overview/#t=373.19
part of this method is line 5,https://realpython.com/videos/python-unicode-overview/#t=377.78
which uses the built-in ord() function, returning the code point of the characterhttps://realpython.com/videos/python-unicode-overview/#t=380.42
that is passed in.https://realpython.com/videos/python-unicode-overview/#t=385.01
06:29https://realpython.com/videos/python-unicode-overview/#t=389.63
I’m going to import that function into the REPL and start with a simple stringhttps://realpython.com/videos/python-unicode-overview/#t=389.63
in English saying 'Hello there'.https://realpython.com/videos/python-unicode-overview/#t=394.91
Calling code_points() on that prints out the code point for each one ofhttps://realpython.com/videos/python-unicode-overview/#t=396.79
the values in the str (string).https://realpython.com/videos/python-unicode-overview/#t=400.91
06:42https://realpython.com/videos/python-unicode-overview/#t=402.86
If you look at this, capital 'H' is 72 in ASCII,https://realpython.com/videos/python-unicode-overview/#t=402.86
so it maps down below. Six characters in, you’ll see 32—https://realpython.com/videos/python-unicode-overview/#t=406.88
that’s a space (" ") in ASCII. Notice that every one of these numbers is below 128—https://realpython.com/videos/python-unicode-overview/#t=411.77
that means they’re in the range of the original ASCII 7-bit standard.https://realpython.com/videos/python-unicode-overview/#t=417.74
07:02https://realpython.com/videos/python-unicode-overview/#t=422.15
Let’s look at something a little more challenging.https://realpython.com/videos/python-unicode-overview/#t=422.15
07:05https://realpython.com/videos/python-unicode-overview/#t=425.18
Here’s some Russian that says “da svidaniya”, or at least, that’s what the web page Ihttps://realpython.com/videos/python-unicode-overview/#t=425.18
copied it from said it did—https://realpython.com/videos/python-unicode-overview/#t=430.1
I hope it says that. Running code_points() on it,https://realpython.com/videos/python-unicode-overview/#t=431.57
07:16https://realpython.com/videos/python-unicode-overview/#t=436.88
you get a significantly larger set of numbers.https://realpython.com/videos/python-unicode-overview/#t=436.88
Now, the third character in is 32—a space—just like in 'Hello there'.https://realpython.com/videos/python-unicode-overview/#t=440.15
And if you look near here at the end, there’s a character that’s 225,https://realpython.com/videos/python-unicode-overview/#t=445.46
which is below 256 in the extended ASCII range.https://realpython.com/videos/python-unicode-overview/#t=449.57
07:34https://realpython.com/videos/python-unicode-overview/#t=454.37
That is the accentedhttps://realpython.com/videos/python-unicode-overview/#t=454.37
'á'. Everything else here is from the Cyrillic alphabet,https://realpython.com/videos/python-unicode-overview/#t=456.23
which has much higher code point numbers above the ASCII range. All of these,https://realpython.com/videos/python-unicode-overview/#t=460.04
as you’ll notice, are sort of around a thousand.https://realpython.com/videos/python-unicode-overview/#t=464.63
07:48https://realpython.com/videos/python-unicode-overview/#t=468.35
That’s it for the introduction. Next up,https://realpython.com/videos/python-unicode-overview/#t=468.35
I’ll dive deeper into Python strings and their relationship to ASCII.https://realpython.com/videos/python-unicode-overview/#t=470.45
Become a Memberhttps://realpython.com/account/join/
Overviewhttps://realpython.com/courses/python-unicode/
https://realpython.com/videos/ascii-python-string-module/
Unicode in Python: Working With Character Encodings (Overview) 07:56 https://realpython.com/videos/python-unicode-overview/
Working With ASCII and the Python String Module 05:49 https://realpython.com/videos/ascii-python-string-module/
Working in Binary: Bits, Bytes, Oct, and Hex 06:26 https://realpython.com/lessons/bits-bytes-oct-hex/
Using Unicode 04:15 https://realpython.com/lessons/using-unicode/
Encoding UTF-8 06:19 https://realpython.com/lessons/encoding-utf8/
Combining Characters 05:40 https://realpython.com/lessons/combining-characters/
Using Built-In Functions 05:38 https://realpython.com/lessons/built-in-functions/
Using Other Encodings 04:45 https://realpython.com/lessons/other-encodings/
Unicode in Python: Working With Character Encodings (Summary) 04:53 https://realpython.com/lessons/python-unicode-summary/
Privacy Policyhttps://realpython.com/privacy-policy/

Viewport: width=device-width, initial-scale=1, shrink-to-fit=no, viewport-fit=cover

Robots: max-image-preview:large


URLs of crawlers that visited me.