Title: Line normalization before diff · Issue #219 · java-diff-utils/java-diff-utils · GitHub
Open Graph Title: Line normalization before diff · Issue #219 · java-diff-utils/java-diff-utils
X Title: Line normalization before diff · Issue #219 · java-diff-utils/java-diff-utils
Description: Description The DiffRowGenerator class offers the lineNormalizer property. By default, it is used to replace < and > by their escaped versions < and >. The lineNormalizer is applied to the input texts before the diff is calculated....
Open Graph Description: Description The DiffRowGenerator class offers the lineNormalizer property. By default, it is used to replace < and > by their escaped versions < and >. The lineNormalizer is applied to the in...
X Description: Description The DiffRowGenerator class offers the lineNormalizer property. By default, it is used to replace < and > by their escaped versions < and >. The lineNormalizer is app...
Opengraph URL: https://github.com/java-diff-utils/java-diff-utils/issues/219
X: @github
Domain: github.com
{"@context":"https://schema.org","@type":"DiscussionForumPosting","headline":"Line normalization before diff","articleBody":"**Description**\n\nThe `DiffRowGenerator` class offers the `lineNormalizer` property. By default, it is used to replace `\u003c` and `\u003e` by their escaped versions `\u0026lt;` and `\u0026gt;`.\n\nThe `lineNormalizer` is applied to the input texts before the diff is calculated. While I see this is as a useful feature, in case of the default settings it might be surprising that the resulting text might not have HTML escaping anymore:\n\n```java\nfinal var generator = DiffRowGenerator.create() //\n .mergeOriginalRevised(true) //\n .showInlineDiffs(true) //\n .inlineDiffByWord(true) //\n .build();\n\nfinal var rows = generator.generateDiffRows(List.of(\"hello \u003cworld\u003e\"), List.of(\"bye \u003eworld\u003c\"));\n\nfinal var resultingText = rows.stream() //\n .map(DiffRow::getOldLine) //\n .collect(Collectors.joining(StringUtils.LF));\n``` \n\nThe resulting text is\n```\n\u003cspan class=\"editOldInline\"\u003ehello\u003c/span\u003e\u003cspan class=\"editNewInline\"\u003ebye\u003c/span\u003e \u0026\u003cspan class=\"editOldInline\"\u003elt\u003c/span\u003e\u003cspan class=\"editNewInline\"\u003egt\u003c/span\u003e;world\u0026\u003cspan class=\"editOldInline\"\u003egt\u003c/span\u003e\u003cspan class=\"editNewInline\"\u003elt\u003c/span\u003e;\n``` \n\nNote the part ` \u0026` is considered as an equal text part because both replacements `\u0026lt;` and `\u0026gt;` start with an ampersand. This resulting text is therefore no valid HTML anymore.\n\nIn order for this behaviour to be a problem, the following conditions must all be true:\n\n1. The `inlineDiffByWord` must be used\n2. The default `lineNormalizer` must be used\n3. The two provided texts must differ at a position which starts with a character that is replaced by the `lineNormalizer`\n4. A release \u003e= 4.15 must be used.\n\n**Workaround**\nOverride the `lineNormalizer`. E.g., by using the `SPLIT_BY_WORD_PATTERN` of release 4.12, in which [the ampersand was not considered a character that splits words](https://github.com/java-diff-utils/java-diff-utils/blob/0fd3bd8e061eed09dbb937c8ab9ba0969ba12264/java-diff-utils/src/main/java/com/github/difflib/text/DiffRowGenerator.java#L70).\n\n**Solution approaches**\nIMHO, the `SPLIT_BY_WORD_PATTERN` of release 4.15+ is fine and I do not consider it to be the problem.\n\nThe library could offer one of the following features:\n1. a parameter which defines when the 'lineNormalizer' should be applied (before diff-ing or after)\n2. a second type of line-normalizer that is applied after diff-ing\n3. an option to have the library apply the [`processDiffs` function](https://github.com/java-diff-utils/java-diff-utils/blob/637cb7b6a309d66ff5e0cec2b3ffea52f867edc7/java-diff-utils/src/main/java/com/github/difflib/text/DiffRowGenerator.java#L190) to non-diffs as well\n","author":{"url":"https://github.com/epictecch","@type":"Person","name":"epictecch"},"datePublished":"2025-09-22T14:09:33.000Z","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/CommentAction","userInteractionCount":3},"url":"https://github.com/219/java-diff-utils/issues/219"}
| route-pattern | /_view_fragments/issues/show/:user_id/:repository/:id/issue_layout(.:format) |
| route-controller | voltron_issues_fragments |
| route-action | issue_layout |
| fetch-nonce | v2:3726ce0d-96c5-019e-5f46-9864efe91fa7 |
| current-catalog-service-hash | 81bb79d38c15960b92d99bca9288a9108c7a47b18f2423d0f6438c5b7bcd2114 |
| request-id | 8330:1DC709:766EE9D:A29FD3C:696779C1 |
| html-safe-nonce | f17b2777a6ee8109172e6edcd8a5ab5113e44cb2187035fed3d5211addf498c2 |
| visitor-payload | eyJyZWZlcnJlciI6IiIsInJlcXVlc3RfaWQiOiI4MzMwOjFEQzcwOTo3NjZFRTlEOkEyOUZEM0M6Njk2Nzc5QzEiLCJ2aXNpdG9yX2lkIjoiMzg1NjIyODQ2ODU2MjI5NTIzMyIsInJlZ2lvbl9lZGdlIjoiaWFkIiwicmVnaW9uX3JlbmRlciI6ImlhZCJ9 |
| visitor-hmac | 9f2a53e1201aa2d24d6afb4e7e57d0ce0767a7a0507a607962c595c0eb1aac31 |
| hovercard-subject-tag | issue:3441197175 |
| github-keyboard-shortcuts | repository,issues,copilot |
| google-site-verification | Apib7-x98H0j5cPqHWwSMm6dNU4GmODRoqxLiDzdx9I |
| octolytics-url | https://collector.github.com/github/collect |
| analytics-location | / |
| fb:app_id | 1401488693436528 |
| apple-itunes-app | app-id=1477376905, app-argument=https://github.com/_view_fragments/issues/show/java-diff-utils/java-diff-utils/219/issue_layout |
| twitter:image | https://opengraph.githubassets.com/92be352d38641d59f901d94235c155dcd720aa3e489d9cc8db38b69da1486ce1/java-diff-utils/java-diff-utils/issues/219 |
| twitter:card | summary_large_image |
| og:image | https://opengraph.githubassets.com/92be352d38641d59f901d94235c155dcd720aa3e489d9cc8db38b69da1486ce1/java-diff-utils/java-diff-utils/issues/219 |
| og:image:alt | Description The DiffRowGenerator class offers the lineNormalizer property. By default, it is used to replace < and > by their escaped versions < and >. The lineNormalizer is applied to the in... |
| og:image:width | 1200 |
| og:image:height | 600 |
| og:site_name | GitHub |
| og:type | object |
| og:author:username | epictecch |
| hostname | github.com |
| expected-hostname | github.com |
| None | 1a6a4ac66aaa8be0077b72a69fd56fb51cd11215a1e31f0365079e012987234a |
| turbo-cache-control | no-preview |
| go-import | github.com/java-diff-utils/java-diff-utils git https://github.com/java-diff-utils/java-diff-utils.git |
| octolytics-dimension-user_id | 40540835 |
| octolytics-dimension-user_login | java-diff-utils |
| octolytics-dimension-repository_id | 86663812 |
| octolytics-dimension-repository_nwo | java-diff-utils/java-diff-utils |
| octolytics-dimension-repository_public | true |
| octolytics-dimension-repository_is_fork | false |
| octolytics-dimension-repository_network_root_id | 86663812 |
| octolytics-dimension-repository_network_root_nwo | java-diff-utils/java-diff-utils |
| turbo-body-classes | logged-out env-production page-responsive |
| disable-turbo | false |
| browser-stats-url | https://api.github.com/_private/browser/stats |
| browser-errors-url | https://api.github.com/_private/browser/errors |
| release | 045909cd564d790cacd96dcb8722039dff679d63 |
| ui-target | full |
| theme-color | #1e2327 |
| color-scheme | light dark |
Links:
Viewport: width=device-width