Expected Behavior
Uploading a single shapefile dataset should complete in a reasonable time (30-60 seconds) regardless of how many datasets already exist in the GeoNode instance.
Actual Behavior
On GeoNode instances with thousands of existing datasets, uploading a single new dataset takes very long. (e.g., 2:40 minutes for a simple shapefile).
GeoServer logs show repeated requests like:
geoserver4geonode | 05 May 11:57:23 INFO [geoserver.filters] - 172.18.0.8 "GET /geoserver/rest/workspaces/geonode/coveragestores/Karabalgasun_2018_14_100m_optimized/coverages.xml" took 8ms
geoserver4geonode | 05 May 11:57:23 INFO [geoserver.filters] - 172.18.0.8 "GET /geoserver/rest/workspaces/geonode/coveragestores/Karabalgasun_2018_15_100m_optimized/coverages.xml" took 8ms
geoserver4geonode | 05 May 11:57:23 INFO [geoserver.filters] - 172.18.0.8 "GET /geoserver/rest/workspaces/geonode/coveragestores/Karabalgasun_2018_16_100m_optimized/coverages.xml" took 8ms
geoserver4geonode | 05 May 11:57:23 INFO [geoserver.filters] - 172.18.0.8 "GET /geoserver/rest/workspaces/geonode/coveragestores/Karabalgasun_Arctron_2007_UTM48_merged.tif/coverages.xml" took 7ms
... (one [geoserver.filters] request per existing coveragestore)
This looks similar to an issue we faced in gn 3
#7618
Steps to Reproduce the Problem
- Set up a GeoNode instance with 1000+ existing datasets/many coveragestores
- Attempt to upload a new shapefile via the "Upload Dataset" UI
- Observe the upload time
Root Cause
My guess is the bottleneck is in geonode/upload/publisher.py in the sanity_checks() method (line ~176). When validating SRID/projection, the code calls:
res = self.cat.get_resource(x, workspace=self.workspace) # ← NO store parameter!
Without a store parameter, gsconfig.Catalog.get_resource() performs a broad search across ALL stores:
- Calls
get_stores(workspaces=workspace) → returns all 1000+ stores
- Loops through each store and calls
store.get_resources(name=name)
- Result: 1000+ REST API calls for a single resource check
This occurs for each resource being validated.
Solution
Add the store= parameter to the get_resource() call in sanity_checks(). Since the resource was just published to self.store, we can search only there:
def sanity_checks(self, resources):
for _resource in resources:
# OPTIMIZATION: Add store= parameter to search only in the specific store
res = self.cat.get_resource(
_resource.get("name"),
store=self.store, # ← KEY FIX
workspace=self.workspace
)
if not res or (res and not res[0].projection):
raise PublishResourceException(
f"The SRID for the resource {_resource} is not correctly set. Please check Geoserver logs"
)
Performance Impact
- Before: 2:40 minutes for a single shapefile upload
- After: 44 seconds
Specifications
- GeoNode version: 5.0.1
- Installation type: geonode-project (iDAI.geoserver fork)
- Installation method: docker
- Platform: Ubuntu 24.04, GeoServer 2.27.5
- Additional details:
- Instance has 2500+ existing datasets
- Affected file:
geonode/upload/publisher.py
- Root cause:
gsconfig.Catalog.get_resource() behavior without store parameter
Expected Behavior
Uploading a single shapefile dataset should complete in a reasonable time (30-60 seconds) regardless of how many datasets already exist in the GeoNode instance.
Actual Behavior
On GeoNode instances with thousands of existing datasets, uploading a single new dataset takes very long. (e.g., 2:40 minutes for a simple shapefile).
GeoServer logs show repeated requests like:
This looks similar to an issue we faced in gn 3
#7618
Steps to Reproduce the Problem
Root Cause
My guess is the bottleneck is in
geonode/upload/publisher.pyin thesanity_checks()method (line ~176). When validating SRID/projection, the code calls:Without a
storeparameter,gsconfig.Catalog.get_resource()performs a broad search across ALL stores:get_stores(workspaces=workspace)→ returns all 1000+ storesstore.get_resources(name=name)This occurs for each resource being validated.
Solution
Add the
store=parameter to theget_resource()call insanity_checks(). Since the resource was just published toself.store, we can search only there:Performance Impact
Specifications
geonode/upload/publisher.pygsconfig.Catalog.get_resource()behavior without store parameter