New (alternative) message entity parsers by Badiboy · Pull Request #2556 · eternnoir/pyTelegramBotAPI

Badiboy · 2026-02-14T16:25:12Z

Description

I now myself faced the issue for some hard-structured message entities parsing, so the time had come to rewrite apply_html_entities processor. I was not able to code it myself due to lack of time, so I had videcoded them. I added 3 test versions of message entity parsers (apply_html_entities processor)/

apply_html_entities_ds
apply_html_entities_gm
apply_html_entities_cg

They can be chosen by setting
formatting.ENTITY_PASER_MODE
entity.

Describe your tests

I made some draft tests, all of them are assumed to work. I'll take them to prod and check how things are going. If any of them will survive - I'll provide it to replace original one.

Added 3 test versions of message entity parsers (apply_html_entities processor).

Badiboy · 2026-02-14T16:26:08Z

@coder2020official Your oppinion?

Copilot

Pull request overview

This pull request adds three alternative implementations of the message entity parser (apply_html_entities) function, generated by AI coding assistants (DeepSeek, Gemini, and ChatGPT). The implementations can be selected via the ENTITY_PASER_MODE configuration variable. The PR aims to provide alternative approaches to handling hard-structured message entity parsing, which will be tested in production to determine if any should replace the original implementation.

Changes:

Added global configuration variable ENTITY_PASER_MODE to select parser implementation
Added DeepSeek implementation (apply_html_entities_ds) with EntityProcessor class using recursive entity processing
Added Gemini implementation (apply_html_entities_gm) using marker-based tag insertion with intersection handling
Added ChatGPT implementation (apply_html_entities_cg) using tree-based entity rendering

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

telebot/formatting.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

coder2020official · 2026-02-14T16:40:12Z

🥶

Are you sure this is a good idea? I've known about the issue for some time, and I even made a PR like a year ago (was closed); I could take a look at this issue sometime later, but if this is an emergency--

I still don't like the idea of choice here

coder2020official · 2026-02-14T16:46:10Z

Couldn't find the PR, but had the code somewhere

coder2020official · 2026-02-14T16:48:02Z

Found my code, will see how it works

Badiboy · 2026-02-14T16:52:08Z

Are you sure this is a good idea?

Current version is obviously not fully compliant. There can be various ways to solve it, I did what I could for now because I need it now )

You have alternative way - let's check it.

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 22 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

telebot/formatting.py

Copilot · 2026-02-14T18:16:29Z

telebot/formatting.py

+def apply_template(entity, inner: str, custom_subs: Optional[Dict[str, str]]) -> str:
+    t = entity.type
+    if t in ENTITY_TEMPLATES_CG:
+        tpl = ENTITY_TEMPLATES_CG[t]
+    elif custom_subs and t in custom_subs:
+        tpl = custom_subs[t]
+    else:
+        return inner
+
+    data = {"text": inner}
+
+    if t == "text_link":
+        data["url"] = getattr(entity, "url", "")
+    elif t == "text_mention":
+        data["url"] = f"tg://user?id={getattr(entity, 'user', {}).id if getattr(entity, 'user', None) else ''}"
+    elif t == "custom_emoji":
+        data["custom_emoji_id"] = getattr(entity, "custom_emoji_id", "")
+    elif (t == "pre") and getattr(entity, "language", None):
+        data["text"] = f'<code class="language-{entity.language}">{inner}</code></pre>'
+
+    return tpl.format(**data)


The ChatGPT implementation has a critical bug in the template handling. When entity type is "pre" with a language, it sets data["text"] to include closing </pre> tag (line 953), but then the template at line 910 already has {text} within <pre>{text}</pre>. This will result in <pre><code class="language-X">content</code></pre></pre> with a double closing tag.

Copilot · 2026-02-14T18:16:29Z

telebot/formatting.py

+    elif t == "expandable_blockquote":
+        return {'open': "<blockquote expandable>", 'close': "</blockquote>"}
+    elif t == "text_link":
+        return {'open': f'<a href="{entity.url}">', 'close': "</a>"}


The Gemini implementation doesn't escape URLs before inserting them into HTML attributes. This is a security vulnerability that could lead to XSS attacks if entity.url contains malicious content. Unlike the DeepSeek version which uses escape_html(entity.url or ""), this implementation inserts the URL directly.

Suggested change

return {'open': f'<a href="{entity.url}">', 'close': "</a>"}

return {'open': f'<a href="{escape_html(entity.url or "")}">', 'close': "</a>"}

Copilot · 2026-02-14T18:16:29Z

telebot/formatting.py

+    elif t == "text_mention":
+        return {'open': f'<a href="tg://user?id={entity.user.id}">', 'close': "</a>"}
+    elif t == "custom_emoji":
+        return {'open': f'<tg-emoji emoji-id="{entity.custom_emoji_id}">', 'close': "</tg-emoji>"}


Direct attribute access without safety checks. If entity doesn't have a 'custom_emoji_id' attribute, this will raise an AttributeError. Should use getattr(entity, 'custom_emoji_id', '') for safety.

Suggested change

return {'open': f'<tg-emoji emoji-id="{entity.custom_emoji_id}">', 'close': "</tg-emoji>"}

emoji_id = getattr(entity, "custom_emoji_id", "")

return {'open': f'<tg-emoji emoji-id="{emoji_id}">', 'close': "</tg-emoji>"}

telebot/formatting.py

Copilot · 2026-02-14T18:16:32Z

telebot/formatting.py

+    elif t == "custom_emoji":
+        data["custom_emoji_id"] = getattr(entity, "custom_emoji_id", "")
+    elif (t == "pre") and getattr(entity, "language", None):
+        data["text"] = f'<code class="language-{entity.language}">{inner}</code></pre>'


This implementation has the same issue as in the DeepSeek version. The template incorrectly includes </pre> within the formatted string, which will cause the template.format() to append this closing tag to the text, then the template's {text} placeholder will insert it again, resulting in malformed HTML with double closing tags.

Suggested change

data["text"] = f'<code class="language-{entity.language}">{inner}</code></pre>'

data["text"] = f'<code class="language-{entity.language}">{inner}</code>'

Copilot · 2026-02-14T18:16:32Z

telebot/formatting.py

+# Alternative message entities parsers. Can be:
+# "deepseek" - deepseek version
+# "gemini" - gemini version
+# "chatgpt" - chatgpt version
+# other values - original version
+ENTITY_PARSER_MODE = None
+
+


The new ENTITY_PARSER_MODE configuration variable lacks documentation. It should have a docstring explaining its purpose, valid values, and how to use it. This is especially important for a public API configuration option.

Suggested change

# Alternative message entities parsers. Can be:

# "deepseek" - deepseek version

# "gemini" - gemini version

# "chatgpt" - chatgpt version

# other values - original version

ENTITY_PARSER_MODE = None

# Configuration option for selecting an alternative message entities parser.

# This is part of the public API and can be set by library users.

ENTITY_PARSER_MODE: Optional[str] = None

"""Configuration mode for alternative message entity parsers.

Set this variable to one of the supported string values to select a

specific parser implementation:

- ``"deepseek"`` – use the DeepSeek-based parser implementation.

- ``"gemini"`` – use the Gemini-based parser implementation.

- ``"chatgpt"`` – use the ChatGPT-based parser implementation.

- ``None`` or any other value – use the original, built-in parser.

Typical usage is to set :data:`ENTITY_PARSER_MODE` once at application

start-up, before creating bots or sending/processing messages that

contain entities, for example:

.. code:: python3

from telebot import formatting

formatting.ENTITY_PARSER_MODE = "deepseek"

"""

Copilot · 2026-02-14T18:16:32Z

telebot/formatting.py

+    mapping = utf16_index_map(text)
+    tree = build_tree(entities, mapping)
+    return render(text, tree, custom_subs)
+#endregion


The region comments (#region, #endregion) are IDE-specific markers that may not be meaningful to all developers. While they help with code folding in some IDEs, they don't follow Python conventions. Consider using standard docstrings or comment blocks instead to maintain portability and clarity across different development environments.

Suggested change

#endregion

telebot/formatting.py

Badiboy · 2026-02-14T18:54:36Z

@coder2020official Now? )

coder2020official · 2026-02-14T19:01:14Z

If you insist on such structure - I guess it's okay

coder2020official · 2026-02-14T19:02:08Z

Personally though I wouldn't want to have this in the release..

Badiboy · 2026-02-14T19:02:35Z

We'll clear unnecessary methods after finalizing that one of them are fully compliant. And even remove the original. I need now the ability to switch methods to test them.

Badiboy · 2026-02-14T19:03:03Z

Personally though I wouldn't want to have this in the release..

So what you propose? Force replacing the original one with yours? Or what?

Badiboy · 2026-02-14T19:05:15Z

Ok, wrong way for testing. I'll run them locally.

coder2020official · 2026-02-14T19:05:30Z

I think we're hurrying with a solution. We shouldn't release this right away into production

New (alternative) message entity parsers

d8f1595

Added 3 test versions of message entity parsers (apply_html_entities processor).

Badiboy requested a review from Copilot February 14, 2026 16:26

Copilot started reviewing on behalf of Badiboy February 14, 2026 16:26 View session

Copilot AI reviewed Feb 14, 2026

View reviewed changes

Badiboy and others added 2 commits February 14, 2026 19:39

Update telebot/formatting.py

9931d19

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update telebot/formatting.py

c46d91c

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Badiboy added 2 commits February 14, 2026 21:05

Fix issues, apply CoPilot hints

e710611

Merge branch 'master' of https://github.com/Badiboy/pyTelegramBotAPI

cfa288a

Badiboy requested a review from Copilot February 14, 2026 18:10

Copilot started reviewing on behalf of Badiboy February 14, 2026 18:10 View session

Copilot AI reviewed Feb 14, 2026

View reviewed changes

Added 4th alternative. Manually coded )

ab8d1ad

Badiboy closed this Feb 14, 2026

	return {'open': f'<a href="{entity.url}">', 'close': "</a>"}
	return {'open': f'<a href="{escape_html(entity.url or "")}">', 'close': "</a>"}

	return {'open': f'<tg-emoji emoji-id="{entity.custom_emoji_id}">', 'close': "</tg-emoji>"}
	emoji_id = getattr(entity, "custom_emoji_id", "")
	return {'open': f'<tg-emoji emoji-id="{emoji_id}">', 'close': "</tg-emoji>"}

	data["text"] = f'<code class="language-{entity.language}">{inner}</code></pre>'
	data["text"] = f'<code class="language-{entity.language}">{inner}</code>'

-# Alternative message entities parsers. Can be:
-# "deepseek" - deepseek version
-# "gemini" - gemini version
-# "chatgpt" - chatgpt version
-# other values - original version
-ENTITY_PARSER_MODE = None
+# Configuration option for selecting an alternative message entities parser.
+# This is part of the public API and can be set by library users.
+ENTITY_PARSER_MODE: Optional[str] = None
+"""Configuration mode for alternative message entity parsers.
+Set this variable to one of the supported string values to select a
+specific parser implementation:
+- ``"deepseek"`` – use the DeepSeek-based parser implementation.
+- ``"gemini"`` – use the Gemini-based parser implementation.
+- ``"chatgpt"`` – use the ChatGPT-based parser implementation.
+- ``None`` or any other value – use the original, built-in parser.
+Typical usage is to set :data:`ENTITY_PARSER_MODE` once at application
+start-up, before creating bots or sending/processing messages that
+contain entities, for example:
+.. code:: python3
+    from telebot import formatting
+    formatting.ENTITY_PARSER_MODE = "deepseek"
+"""

Conversation

Badiboy commented Feb 14, 2026

Description

Describe your tests

Uh oh!

Badiboy commented Feb 14, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coder2020official commented Feb 14, 2026

Uh oh!

coder2020official commented Feb 14, 2026

Uh oh!

coder2020official commented Feb 14, 2026

Uh oh!

Badiboy commented Feb 14, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Badiboy commented Feb 14, 2026

Uh oh!

coder2020official commented Feb 14, 2026

Uh oh!

coder2020official commented Feb 14, 2026

Uh oh!

Badiboy commented Feb 14, 2026

Uh oh!

Badiboy commented Feb 14, 2026

Uh oh!

Badiboy commented Feb 14, 2026

Uh oh!

coder2020official commented Feb 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments