Title: dashes in word_count.txt cause errors with WordCount.py · Issue #12 · jleetutorial/python-spark-tutorial · GitHub
Open Graph Title: dashes in word_count.txt cause errors with WordCount.py · Issue #12 · jleetutorial/python-spark-tutorial
X Title: dashes in word_count.txt cause errors with WordCount.py · Issue #12 · jleetutorial/python-spark-tutorial
Description: Issue: Thendash characters in word_count.txt cause an error when following the "Run your first Spark Job" tutorial. There are only two occurences of this character here: "from 1913–74." and here: "near–bankruptcy". To Recreate: using spa...
Open Graph Description: Issue: Thendash characters in word_count.txt cause an error when following the "Run your first Spark Job" tutorial. There are only two occurences of this character here: "from 1913–74." and here: "...
X Description: Issue: Thendash characters in word_count.txt cause an error when following the "Run your first Spark Job" tutorial. There are only two occurences of this character here: "from 1913–7...
Opengraph URL: https://github.com/jleetutorial/python-spark-tutorial/issues/12
X: @github
Domain: patch-diff.githubusercontent.com
{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"dashes in word_count.txt cause errors with WordCount.py","articleBody":"### Issue:\r\nThe`ndash` characters in `word_count.txt` cause an error when following the \"Run your first Spark Job\" tutorial. There are only two occurences of this character here: \"`from 1913–74.`\" and here: \"`near–bankruptcy`\".\r\n\r\n#### To Recreate:\r\nusing `spark-2.3.2-bin-hadoop2.7` on Ubuntu18, pyspark/python 2.7, Installed following instructions from lecture 5, go to directory where you cloned `python-spark-tutorial` and run the following from lecture 6:\r\n\r\n`spark-submit ./rdd/WordCount.py`\r\n\r\nThe execution halts about halfway through the frequency counter with the following error:\r\n\r\n```\r\nUnicodeEncodeError: 'ascii' codec can't encode character u'\\u2013' in position4: ordinal not in range(128)\r\n```\r\nSpoiler, it's the dash. I'm not sure whether or not the utf16 dash was intentional, so I'm posting. \r\n\r\n#### Work-Around:\r\n\r\nI changed the two `ndash` characters to \"`from 1913-74.`\" and \"`near-bankruptcy`\", which solved the issue for me. Related [stackoverflow thread](https://stackoverflow.com/questions/20329896/python-2-7-character-u2013) where someone else ran into a similar problem with python2.7 and used the same solution.\r\n","author":{"url":"https://github.com/HarryCaveMan","@type":"Person","name":"HarryCaveMan"},"datePublished":"2018-11-04T05:16:14.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":1},"url":"https://github.com/12/python-spark-tutorial/issues/12"}
| route-pattern | /_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format) |
| route-controller | voltron_issues_fragments |
| route-action | issue_layout |
| fetch-nonce | v2:54e66488-82f9-77a3-1dfe-2c4d49ad0688 |
| current-catalog-service-hash | 81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114 |
| request-id | 9C94:3AD161:1773068:1FB3440:6972AE69 |
| html-safe-nonce | 9d4d1c083653c8a1f6e8cdf431f8615eb0a03aa124e62827a72ced8b8a159f3e |
| visitor-payload | eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiI5Qzk0OjNBRDE2MToxNzczMDY4OjFGQjM0NDA6Njk3MkFFNjkiLCJ2aXNpdG9yX2lkIjoiNTUyNjQ1NTMyNDg2MTcwNTgzMyIsInJlZ2lvbl9lZGdlIjoiaWFkIiwicmVnaW9uX3JlbmRlciI6ImlhZCJ9 |
| visitor-hmac | 1e41c58a9a8183243ffccefcea15364b1435bd8a3da7977b8049a558c4b5cace |
| hovercard-subject-tag | issue:377122208 |
| github-keyboard-shortcuts | repository,issues,copilot |
| google-site-verification | Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I |
| octolytics-url | https://collector.github.com/github/collect |
| analytics-location | / |
| fb:app_id | 1401488693436528 |
| apple-itunes-app | app-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/jleetutorial/python-spark-tutorial/12/issue_layout |
| twitter:image | https://opengraph.githubassets.com/18bedf02a40c30c83f81d1b3eff875bc6b7cafea9dcc47b12903f9cdb29fff08/jleetutorial/python-spark-tutorial/issues/12 |
| twitter:card | summary_large_image |
| og:image | https://opengraph.githubassets.com/18bedf02a40c30c83f81d1b3eff875bc6b7cafea9dcc47b12903f9cdb29fff08/jleetutorial/python-spark-tutorial/issues/12 |
| og:image:alt | Issue: Thendash characters in word_count.txt cause an error when following the "Run your first Spark Job" tutorial. There are only two occurences of this character here: "from 1913–74." and here: "... |
| og:image:width | 1200 |
| og:image:height | 600 |
| og:site_name | GitHub |
| og:type | object |
| og:author:username | HarryCaveMan |
| hostname | github.com |
| expected-hostname | github.com |
| None | ae357919e9cc5fb635a01c9a2cc530478d3ac85f55090215eb70e1beca3385ac |
| turbo-cache-control | no-preview |
| go-import | github.com/jleetutorial/python-spark-tutorial git https://github.com/jleetutorial/python-spark-tutorial.git |
| octolytics-dimension-user_id | 19826074 |
| octolytics-dimension-user_login | jleetutorial |
| octolytics-dimension-repository_id | 104780751 |
| octolytics-dimension-repository_nwo | jleetutorial/python-spark-tutorial |
| octolytics-dimension-repository_public | true |
| octolytics-dimension-repository_is_fork | false |
| octolytics-dimension-repository_network_root_id | 104780751 |
| octolytics-dimension-repository_network_root_nwo | jleetutorial/python-spark-tutorial |
| turbo-body-classes | logged-out env-production page-responsive |
| disable-turbo | false |
| browser-stats-url | https://api.github.com/_private/browser/stats |
| browser-errors-url | https://api.github.com/_private/browser/errors |
| release | 625f1bd9f76a617a9c0729e2de91edb56b6ce42f |
| ui-target | full |
| theme-color | #1e2327 |
| color-scheme | light dark |
Links:
Viewport: width=device-width