Skip to content

GML-2078 Release 1.3.1#34

Merged
chengbiao-jin merged 9 commits intomainfrom
release_1.3.1
Apr 22, 2026
Merged

GML-2078 Release 1.3.1#34
chengbiao-jin merged 9 commits intomainfrom
release_1.3.1

Conversation

@chengbiao-jin
Copy link
Copy Markdown
Collaborator

@chengbiao-jin chengbiao-jin commented Apr 22, 2026

User description

Bug fixes.


PR Type

Bug fix, Enhancement


Description

  • Harden TigerGraph query and ingest flows

    • Reinstall community queries when needed
    • Verify queries and loading jobs post-run
    • Auto-recreate missing loading jobs
    • Parse load stats for accurate counts
  • Fix chat and connection stability

    • Handle early WebSocket disconnects safely
    • Check graph existence via getVertexTypes
    • Use listGraphs() over deprecated LS USER
    • Prevent token revoke JSON destructor errors
  • Reduce graph payloads and duplication

    • Fetch only description for entities
    • Fuzzy-deduplicate near-duplicate descriptions
    • Limit stream_community output to IDs
    • Remove deprecated query metadata usage
  • Expand file processing support

    • Add .png and .gif extraction support
    • Clarify JSONL copy versus conversion logs

Diagram Walkthrough

flowchart LR
  A["GraphRAG pipeline"] -->|"installs"| B["Required queries"]
  A -->|"reinstalls before community step"| C["Community queries"]
  A -->|"verifies after run"| D["Queries and loading jobs"]
  E["Server ingestion"] -->|"ensures exists"| F["Loading job recreation"]
  E -->|"parses"| G["Load statistics"]
  H["Chat websocket"] -->|"handles"| I["Early disconnects"]
  J["Entity lookup"] -->|"fetches only"| K["Description attribute"]
  K -->|"deduplicates with"| L["Fuzzy matching"]
Loading

File Walkthrough

Relevant files
Bug fix
8 files
tg_proxy.py
Fix token revocation payload and destructor safety             
+14/-10 
util.py
Scope required queries and optimize vertex checks               
+27/-23 
workers.py
Harden query install checks and deduplicate descriptions 
+24/-7   
util.py
Improve install error detection and vertex fetching           
+6/-2     
workers.py
Add fuzzy description deduplication for entities                 
+25/-8   
agent_graph.py
Detect SupportAI via schema and remove metadata calls       
+5/-15   
ui.py
Handle websocket disconnects and replace deprecated graph listing
+17/-13 
supportai.py
Recreate loading jobs and parse ingestion statistics         
+62/-11 
Enhancement
3 files
text_extractors.py
Add image formats and clearer preparation logging               
+3/-1     
graph_rag.py
Reinstall community queries and verify pipeline artifacts
+41/-0   
stream_community.gsql
Reduce community query output to vertex IDs                           
+1/-2     
Error handling
1 files
main.py
Validate graph existence before starting ECC tasks             
+7/-1     
Documentation
1 files
CHANGELOG.md
Document release 1.3.1 fixes and improvements                       
+15/-0   
Configuration changes
1 files
VERSION
Bump application version to 1.3.1                                               
+1/-1     
Dependencies
1 files
requirements.txt
Upgrade `pyTigerGraph` dependency to supported version     
+1/-1     

chengbiao-jin and others added 6 commits April 16, 2026 15:47
- Upgrade pyTigerGraph dependency from ==1.9.1 to >=2.0.3
- Fix WebSocket chat crash on early client disconnect (catch
  WebSocketDisconnect during auth and conversation ID phases)
- Auto-recreate loading jobs before ingestion if missing
- Parse loading job statistics for accurate document/rejected line counts
- Clarify file preparation log to distinguish JSONL copies from conversions
- Extract COMMUNITY_QUERIES and REQUIRED_QUERIES constants to scope
  query reinstallation per pipeline step
- Reinstall community queries before community detection to prevent
  404 errors from missing louvain queries
- Add post-pipeline verification of query and loading job status
- Harden error detection in install_queries (case-insensitive, check
  for "does not exist" and "failed" patterns)
- Replace listGraphs() with getVertexTypes() for graph existence check
  in ECC main to avoid KeyError on GraphName
- Switch chatbot supportai detection from query metadata to
  DocumentChunk vertex type existence check
…etadata, reduce stream_community response size

- Fix __del__ JSON parse error by using json.dumps instead of str() for token revocation payload
- Replace deprecated LS USER GSQL command with conn.listGraphs() REST API
- Remove getQueryMetadata() calls and unused query_output_format field from agent search methods
- Project only vertex ID in stream_community query to stay within 5MB response limit
…xists, fuzzy dedup in get_vert_desc

- check_vertex_exists uses direct REST call with select=description to avoid fetching all vertex attributes
- get_vert_desc uses SequenceMatcher fuzzy matching (threshold 0.85) to prevent near-duplicate descriptions from accumulating in Entity SET<STRING>
- Optimized matching with length pre-filter and quick_ratio before full ratio computation
- Remove COMMUNITY_QUERIES from REQUIRED_QUERIES to avoid installing them during init
- Community queries are already installed at the start of community detection (graph_rag.py)
- Reduces init timeout risk by deferring INSTALL QUERY ALL to when community queries are needed
@tg-pr-agent
Copy link
Copy Markdown

tg-pr-agent Bot commented Apr 22, 2026

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Empty response

The new description merge path assumes the vertex lookup always returns at least one record with a description attribute. If the REST call returns an empty list for a missing vertex, or a different payload shape, indexing into the first element will raise and break extraction instead of treating the vertex as absent.

async def get_vert_desc(conn, v_id, node: Node):
    new_desc = node.properties.get("description", "")
    exists = await util.check_vertex_exists(conn, v_id)
    if not exists.get("error", False):
        existing_descs = exists["resp"][0]["attributes"]["description"]
        if not new_desc or _is_near_duplicate(new_desc, existing_descs):
            return existing_descs if existing_descs else [new_desc]
        return existing_descs + [new_desc]
    return [new_desc]
Empty response

Similar entity-description deduplication logic assumes the vertex lookup response contains a first result with attributes.description. This should be validated against not-found and schema-variant responses, otherwise extraction can fail on legitimate misses.

async def get_vert_desc(conn, v_id, node: Node):
    new_desc = node.properties.get("description", "")
    exists = await util.check_vertex_exists(conn, v_id)
    if not exists.get("error", False):
        existing_descs = exists["resp"][0]["attributes"]["description"]
        if not new_desc or _is_near_duplicate(new_desc, existing_descs):
            return existing_descs if existing_descs else [new_desc]
        return existing_descs + [new_desc]
    return [new_desc]
Recreate parsing

Loading job recreation checks and post-install validation rely on substring matching against ls output. That output format is version-sensitive, so a formatting change could incorrectly report jobs as missing or recreate them unnecessarily during ingestion.

def _ensure_loading_jobs(conn: TigerGraphConnection, graphname: str, load_job_id: str) -> None:
    """Check that the required loading job exists; recreate it if missing."""
    current_schema = conn.gsql(f"USE GRAPH {graphname}\n ls")
    marker = f"- CREATE LOADING JOB {load_job_id} {{"
    if marker in current_schema:
        return

    gsql_file = _LOADING_JOB_GSQL_FILES.get(load_job_id)
    if not gsql_file:
        raise Exception(f"Loading job '{load_job_id}' not found and no GSQL template available to recreate it")

    logger.info(f"Loading job '{load_job_id}' missing — recreating from {gsql_file}")
    with open(gsql_file, "r") as f:
        q_body = f.read()
    result = conn.gsql(f"USE GRAPH {graphname}\nBEGIN\n{q_body}\nEND\n")
    logger.info(f"Loading job creation result: {result}")
    if isinstance(result, str) and ("error" in result.lower() or "failed" in result.lower()):
        raise Exception(f"Failed to recreate loading job '{load_job_id}': {result}")

Comment thread ecc/app/graphrag/workers.py
Comment thread ecc/app/supportai/workers.py
@chengbiao-jin chengbiao-jin merged commit b003c0e into main Apr 22, 2026
1 check failed
@chengbiao-jin chengbiao-jin deleted the release_1.3.1 branch April 22, 2026 05:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants