GML-2078 Release 1.3.1 by chengbiao-jin · Pull Request #34 · tigergraph/graphrag

chengbiao-jin · 2026-04-22T04:48:40Z

User description

Bug fixes.

PR Type

Bug fix, Enhancement

Description

Harden TigerGraph query and ingest flows
- Reinstall community queries when needed
- Verify queries and loading jobs post-run
- Auto-recreate missing loading jobs
- Parse load stats for accurate counts
Fix chat and connection stability
- Handle early WebSocket disconnects safely
- Check graph existence via getVertexTypes
- Use listGraphs() over deprecated LS USER
- Prevent token revoke JSON destructor errors
Reduce graph payloads and duplication
- Fetch only description for entities
- Fuzzy-deduplicate near-duplicate descriptions
- Limit stream_community output to IDs
- Remove deprecated query metadata usage
Expand file processing support
- Add .png and .gif extraction support
- Clarify JSONL copy versus conversion logs

Diagram Walkthrough

flowchart LR
  A["GraphRAG pipeline"] -->|"installs"| B["Required queries"]
  A -->|"reinstalls before community step"| C["Community queries"]
  A -->|"verifies after run"| D["Queries and loading jobs"]
  E["Server ingestion"] -->|"ensures exists"| F["Loading job recreation"]
  E -->|"parses"| G["Load statistics"]
  H["Chat websocket"] -->|"handles"| I["Early disconnects"]
  J["Entity lookup"] -->|"fetches only"| K["Description attribute"]
  K -->|"deduplicates with"| L["Fuzzy matching"]

File Walkthrough

Relevant files

Bug fix

8 files

tg_proxy.py `Fix token revocation payload and destructor safety`	+14/-10
util.py `Scope required queries and optimize vertex checks`	+27/-23
workers.py `Harden query install checks and deduplicate descriptions`	+24/-7
util.py `Improve install error detection and vertex fetching`	+6/-2
workers.py `Add fuzzy description deduplication for entities`	+25/-8
agent_graph.py `Detect SupportAI via schema and remove metadata calls`	+5/-15
ui.py `Handle websocket disconnects and replace deprecated graph listing`	+17/-13
supportai.py `Recreate loading jobs and parse ingestion statistics`	+62/-11

Enhancement

3 files

text_extractors.py `Add image formats and clearer preparation logging`	+3/-1
graph_rag.py `Reinstall community queries and verify pipeline artifacts`	+41/-0
stream_community.gsql `Reduce community query output to vertex IDs`	+1/-2

Error handling

1 files

main.py `Validate graph existence before starting ECC tasks`	+7/-1

Documentation

1 files

CHANGELOG.md `Document release 1.3.1 fixes and improvements`	+15/-0

Configuration changes

1 files

VERSION `Bump application version to 1.3.1`	+1/-1

Dependencies

1 files

requirements.txt Upgrade `pyTigerGraph` dependency to supported version	+1/-1

- Upgrade pyTigerGraph dependency from ==1.9.1 to >=2.0.3 - Fix WebSocket chat crash on early client disconnect (catch WebSocketDisconnect during auth and conversation ID phases) - Auto-recreate loading jobs before ingestion if missing - Parse loading job statistics for accurate document/rejected line counts - Clarify file preparation log to distinguish JSONL copies from conversions

- Extract COMMUNITY_QUERIES and REQUIRED_QUERIES constants to scope query reinstallation per pipeline step - Reinstall community queries before community detection to prevent 404 errors from missing louvain queries - Add post-pipeline verification of query and loading job status - Harden error detection in install_queries (case-insensitive, check for "does not exist" and "failed" patterns) - Replace listGraphs() with getVertexTypes() for graph existence check in ECC main to avoid KeyError on GraphName - Switch chatbot supportai detection from query metadata to DocumentChunk vertex type existence check

…etadata, reduce stream_community response size - Fix __del__ JSON parse error by using json.dumps instead of str() for token revocation payload - Replace deprecated LS USER GSQL command with conn.listGraphs() REST API - Remove getQueryMetadata() calls and unused query_output_format field from agent search methods - Project only vertex ID in stream_community query to stay within 5MB response limit

…xists, fuzzy dedup in get_vert_desc - check_vertex_exists uses direct REST call with select=description to avoid fetching all vertex attributes - get_vert_desc uses SequenceMatcher fuzzy matching (threshold 0.85) to prevent near-duplicate descriptions from accumulating in Entity SET<STRING> - Optimized matching with length pre-filter and quick_ratio before full ratio computation

- Remove COMMUNITY_QUERIES from REQUIRED_QUERIES to avoid installing them during init - Community queries are already installed at the start of community detection (graph_rag.py) - Reduces init timeout risk by deferring INSTALL QUERY ALL to when community queries are needed

tg-pr-agent · 2026-04-22T04:49:30Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Empty response The new description merge path assumes the vertex lookup always returns at least one record with a `description` attribute. If the REST call returns an empty list for a missing vertex, or a different payload shape, indexing into the first element will raise and break extraction instead of treating the vertex as absent. async def get_vert_desc(conn, v_id, node: Node): new_desc = node.properties.get("description", "") exists = await util.check_vertex_exists(conn, v_id) if not exists.get("error", False): existing_descs = exists["resp"][0]["attributes"]["description"] if not new_desc or _is_near_duplicate(new_desc, existing_descs): return existing_descs if existing_descs else [new_desc] return existing_descs + [new_desc] return [new_desc] Empty response Similar entity-description deduplication logic assumes the vertex lookup response contains a first result with `attributes.description`. This should be validated against not-found and schema-variant responses, otherwise extraction can fail on legitimate misses. async def get_vert_desc(conn, v_id, node: Node): new_desc = node.properties.get("description", "") exists = await util.check_vertex_exists(conn, v_id) if not exists.get("error", False): existing_descs = exists["resp"][0]["attributes"]["description"] if not new_desc or _is_near_duplicate(new_desc, existing_descs): return existing_descs if existing_descs else [new_desc] return existing_descs + [new_desc] return [new_desc] Recreate parsing Loading job recreation checks and post-install validation rely on substring matching against `ls` output. That output format is version-sensitive, so a formatting change could incorrectly report jobs as missing or recreate them unnecessarily during ingestion. def _ensure_loading_jobs(conn: TigerGraphConnection, graphname: str, load_job_id: str) -> None: """Check that the required loading job exists; recreate it if missing.""" current_schema = conn.gsql(f"USE GRAPH {graphname}\n ls") marker = f"- CREATE LOADING JOB {load_job_id} {{" if marker in current_schema: return gsql_file = _LOADING_JOB_GSQL_FILES.get(load_job_id) if not gsql_file: raise Exception(f"Loading job '{load_job_id}' not found and no GSQL template available to recreate it") logger.info(f"Loading job '{load_job_id}' missing — recreating from {gsql_file}") with open(gsql_file, "r") as f: q_body = f.read() result = conn.gsql(f"USE GRAPH {graphname}\nBEGIN\n{q_body}\nEND\n") logger.info(f"Loading job creation result: {result}") if isinstance(result, str) and ("error" in result.lower() or "failed" in result.lower()): raise Exception(f"Failed to recreate loading job '{load_job_id}': {result}")

…GraphName

…scriptions

chengbiao-jin and others added 6 commits April 16, 2026 15:47

added png extension

e8b88c4

tg-pr-agent Bot reviewed Apr 22, 2026

View reviewed changes

Comment thread ecc/app/graphrag/workers.py

Comment thread ecc/app/supportai/workers.py

chengbiao-jin added 3 commits April 21, 2026 21:54

Fix empty graph list after login: listGraphs() returns graphName not …

d986665

…GraphName

Guard get_vert_desc against empty or malformed vertex lookup responses

d57cb71

Guard against ZeroDivisionError in length-ratio precheck for empty de…

a8334cd

…scriptions

chengbiao-jin merged commit b003c0e into main Apr 22, 2026
1 check failed

chengbiao-jin deleted the release_1.3.1 branch April 22, 2026 05:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GML-2078 Release 1.3.1#34

GML-2078 Release 1.3.1#34
chengbiao-jin merged 9 commits intomainfrom
release_1.3.1

chengbiao-jin commented Apr 22, 2026 •

edited by tg-pr-agent Bot

Loading

Uh oh!

tg-pr-agent Bot commented Apr 22, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

chengbiao-jin commented Apr 22, 2026 • edited by tg-pr-agent Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

tg-pr-agent Bot commented Apr 22, 2026

PR Reviewer Guide 🔍

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chengbiao-jin commented Apr 22, 2026 •

edited by tg-pr-agent Bot

Loading