Skip to content

Commit 5aa2716

Browse files
committed
[opt] add more lakehouse fa
1 parent fea2e41 commit 5aa2716

6 files changed

Lines changed: 234 additions & 0 deletions

File tree

docs/faq/lakehouse-faq.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -253,6 +253,28 @@ ln -s /etc/pki/ca-trust/extracted/openssl/ca-bundle.trust.crt /etc/ssl/certs/ca-
253253
254254
If the session timezone is already set to `Asia/Shanghai` but the query still fails, it indicates that the ORC file was generated with the timezone `+08:00`. During query execution, this timezone is required when parsing the ORC footer. In this case, you can try creating a symbolic link under the `/usr/share/zoneinfo/` directory that points `+08:00` to an equivalent timezone.
255255
256+
14. When querying a Hive table that uses JSON SerDe (e.g., `org.openx.data.jsonserde.JsonSerDe`), an error occurs: `failed to get schema` or `Storage schema reading not supported`
257+
258+
When a Hive table uses JSON format storage (ROW FORMAT SERDE is `org.openx.data.jsonserde.JsonSerDe`), the Hive Metastore may not be able to read the table's schema information through the default method, causing the following error when querying from Doris:
259+
260+
```
261+
errCode = 2, detailMessage = failed to get schema for table xxx in db xxx.
262+
reason: org.apache.hadoop.hive.metastore.api.MetaException:
263+
java.lang.UnsupportedOperationException: Storage schema reading not supported
264+
```
265+
266+
This can be resolved by adding `"get_schema_from_table" = "true"` in the Catalog properties. This parameter instructs Doris to retrieve the schema directly from the Hive table metadata instead of relying on the underlying storage's Schema Reader.
267+
268+
```sql
269+
CREATE CATALOG hive PROPERTIES (
270+
'type' = 'hms',
271+
'hive.metastore.uris' = 'thrift://x.x.x.x:9083',
272+
'get_schema_from_table' = 'true'
273+
);
274+
```
275+
276+
This parameter is supported since versions 2.1.10 and 3.0.6.
277+
256278
## HDFS
257279
258280
1. When accessing HDFS 3.x, if you encounter the error `java.lang.VerifyError: xxx`, in versions prior to 1.2.1, Doris depends on Hadoop version 2.8. You need to update to 2.10.2 or upgrade Doris to versions after 1.2.2.
@@ -322,6 +344,23 @@ ln -s /etc/pki/ca-trust/extracted/openssl/ca-bundle.trust.crt /etc/ssl/certs/ca-
322344
- Copy `hdfs-site.xml` and `core-site.xml` to `fe/conf` and `be/conf`. (Recommended)
323345
- In `hdfs-site.xml`, find the corresponding configuration `dfs.data.transfer.protection` and set this parameter in the catalog.
324346
347+
5. When querying a Hive Catalog table, an error occurs: `RPC response has a length of xxx exceeds maximum data length`
348+
349+
For example:
350+
351+
```
352+
RPC response has a length of 1213486160 exceeds maximum data length
353+
```
354+
355+
The value `1213486160` in hexadecimal is `0x48545450`, which corresponds to the ASCII string `"HTTP"`. This indicates that the Doris FE attempted to connect to an HDFS NameNode RPC port, but received an HTTP response instead.
356+
357+
The root cause is that the HDFS NameNode port configured in the Catalog or in `hdfs-site.xml` is incorrect — an HTTP port was used where an RPC port is required. HDFS NameNode typically exposes two types of ports:
358+
359+
- **RPC port** (default: `8020` or `9000`): Used for HDFS client communication (this is the correct port for Doris).
360+
- **HTTP port** (default: `9870` or `50070`): Used for the NameNode Web UI.
361+
362+
Check the HDFS NameNode port configuration in the Catalog properties or in `hdfs-site.xml` under `fe/conf` and `be/conf`, and ensure it is set to the RPC port (`dfs.namenode.rpc-address`), not the HTTP port (`dfs.namenode.http-address`).
363+
325364
## DLF Catalog
326365
327366
1. When using the DLF Catalog, if `Invalid address` occurs during BE reading JindoFS data, add the domain name appearing in the logs to IP mapping in `/etc/hosts`.

i18n/zh-CN/docusaurus-plugin-content-docs/current/faq/lakehouse-faq.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -279,6 +279,28 @@ ln -s /etc/pki/ca-trust/extracted/openssl/ca-bundle.trust.crt /etc/ssl/certs/ca-
279279
280280
如果 `session` 时区已经是 `Asia/Shanghai`,且查询仍然报错,说明生成 ORC 文件时的时区是 `+08:00`, 导致在读取时解析 `footer` 时需要用到 `+08:00` 时区,可以尝试在 `/usr/share/zoneinfo/` 目录下面软链到相同时区上。
281281
282+
14. 查询使用 JSON SerDe(如 `org.openx.data.jsonserde.JsonSerDe`)的 Hive 表时,报错:`failed to get schema` 或 `Storage schema reading not supported`
283+
284+
当 Hive 表使用 JSON 格式存储(ROW FORMAT SERDE 为 `org.openx.data.jsonserde.JsonSerDe`)时,Hive Metastore 可能无法通过默认方式读取表的 Schema 信息,导致 Doris 查询时报错:
285+
286+
```
287+
errCode = 2, detailMessage = failed to get schema for table xxx in db xxx.
288+
reason: org.apache.hadoop.hive.metastore.api.MetaException:
289+
java.lang.UnsupportedOperationException: Storage schema reading not supported
290+
```
291+
292+
可以在 Catalog 属性中添加 `"get_schema_from_table" = "true"` 解决,该参数会让 Doris 直接从 Hive 表的元数据中获取 Schema,而不依赖底层存储的 Schema Reader。
293+
294+
```sql
295+
CREATE CATALOG hive PROPERTIES (
296+
'type' = 'hms',
297+
'hive.metastore.uris' = 'thrift://x.x.x.x:9083',
298+
'get_schema_from_table' = 'true'
299+
);
300+
```
301+
302+
该参数自 2.1.10 和 3.0.6 版本支持。
303+
282304
## HDFS
283305
284306
1. 访问 HDFS 3.x 时报错:`java.lang.VerifyError: xxx`
@@ -353,6 +375,23 @@ ln -s /etc/pki/ca-trust/extracted/openssl/ca-bundle.trust.crt /etc/ssl/certs/ca-
353375
- 拷贝 `hdfs-site.xml` 以及 `core-site.xml` 到 `fe/conf` 和 `be/conf` 目录。(推荐)
354376
- 在 `hdfs-site.xml` 找到相应的配置 `dfs.data.transfer.protection`,并且在 catalog 里面设置该参数。
355377
378+
5. 查询 Hive Catalog 表时报错:`RPC response has a length of xxx exceeds maximum data length`
379+
380+
例如:
381+
382+
```
383+
RPC response has a length of 1213486160 exceeds maximum data length
384+
```
385+
386+
其中 `1213486160` 转换为十六进制为 `0x48545450`,对应 ASCII 字符串 `"HTTP"`。这说明 Doris FE 尝试连接 HDFS NameNode 的 RPC 端口时,实际收到了 HTTP 响应。
387+
388+
根本原因是 Catalog 中或 `hdfs-site.xml` 中配置的 HDFS NameNode 端口不正确——错误地使用了 HTTP 端口而非 RPC 端口。HDFS NameNode 通常暴露两种端口:
389+
390+
- **RPC 端口**(默认:`8020` 或 `9000`):用于 HDFS 客户端通信(Doris 应使用此端口)。
391+
- **HTTP 端口**(默认:`9870` 或 `50070`):用于 NameNode Web UI。
392+
393+
请检查 Catalog 属性或 `fe/conf`、`be/conf` 下 `hdfs-site.xml` 中的 HDFS NameNode 端口配置,确保使用的是 RPC 端口(`dfs.namenode.rpc-address`),而非 HTTP 端口(`dfs.namenode.http-address`)。
394+
356395
## DLF Catalog
357396
358397
1. 使用 DLF Catalog 时,BE 读在取 JindoFS 数据出现`Invalid address`,需要在`/ets/hosts`中添加日志中出现的域名到 IP 的映射。

i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/faq/lakehouse-faq.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -279,6 +279,28 @@ ln -s /etc/pki/ca-trust/extracted/openssl/ca-bundle.trust.crt /etc/ssl/certs/ca-
279279
280280
如果 `session` 时区已经是 `Asia/Shanghai`,且查询仍然报错,说明生成 ORC 文件时的时区是 `+08:00`, 导致在读取时解析 `footer` 时需要用到 `+08:00` 时区,可以尝试在 `/usr/share/zoneinfo/` 目录下面软链到相同时区上。
281281
282+
14. 查询使用 JSON SerDe(如 `org.openx.data.jsonserde.JsonSerDe`)的 Hive 表时,报错:`failed to get schema` 或 `Storage schema reading not supported`
283+
284+
当 Hive 表使用 JSON 格式存储(ROW FORMAT SERDE 为 `org.openx.data.jsonserde.JsonSerDe`)时,Hive Metastore 可能无法通过默认方式读取表的 Schema 信息,导致 Doris 查询时报错:
285+
286+
```
287+
errCode = 2, detailMessage = failed to get schema for table xxx in db xxx.
288+
reason: org.apache.hadoop.hive.metastore.api.MetaException:
289+
java.lang.UnsupportedOperationException: Storage schema reading not supported
290+
```
291+
292+
可以在 Catalog 属性中添加 `"get_schema_from_table" = "true"` 解决,该参数会让 Doris 直接从 Hive 表的元数据中获取 Schema,而不依赖底层存储的 Schema Reader。
293+
294+
```sql
295+
CREATE CATALOG hive PROPERTIES (
296+
'type' = 'hms',
297+
'hive.metastore.uris' = 'thrift://x.x.x.x:9083',
298+
'get_schema_from_table' = 'true'
299+
);
300+
```
301+
302+
该参数自 2.1.10 和 3.0.6 版本支持。
303+
282304
## HDFS
283305
284306
1. 访问 HDFS 3.x 时报错:`java.lang.VerifyError: xxx`
@@ -353,6 +375,23 @@ ln -s /etc/pki/ca-trust/extracted/openssl/ca-bundle.trust.crt /etc/ssl/certs/ca-
353375
- 拷贝 `hdfs-site.xml` 以及 `core-site.xml` 到 `fe/conf` 和 `be/conf` 目录。(推荐)
354376
- 在 `hdfs-site.xml` 找到相应的配置 `dfs.data.transfer.protection`,并且在 catalog 里面设置该参数。
355377
378+
5. 查询 Hive Catalog 表时报错:`RPC response has a length of xxx exceeds maximum data length`
379+
380+
例如:
381+
382+
```
383+
RPC response has a length of 1213486160 exceeds maximum data length
384+
```
385+
386+
其中 `1213486160` 转换为十六进制为 `0x48545450`,对应 ASCII 字符串 `"HTTP"`。这说明 Doris FE 尝试连接 HDFS NameNode 的 RPC 端口时,实际收到了 HTTP 响应。
387+
388+
根本原因是 Catalog 中或 `hdfs-site.xml` 中配置的 HDFS NameNode 端口不正确——错误地使用了 HTTP 端口而非 RPC 端口。HDFS NameNode 通常暴露两种端口:
389+
390+
- **RPC 端口**(默认:`8020` 或 `9000`):用于 HDFS 客户端通信(Doris 应使用此端口)。
391+
- **HTTP 端口**(默认:`9870` 或 `50070`):用于 NameNode Web UI。
392+
393+
请检查 Catalog 属性或 `fe/conf`、`be/conf` 下 `hdfs-site.xml` 中的 HDFS NameNode 端口配置,确保使用的是 RPC 端口(`dfs.namenode.rpc-address`),而非 HTTP 端口(`dfs.namenode.http-address`)。
394+
356395
## DLF Catalog
357396
358397
1. 使用 DLF Catalog 时,BE 读在取 JindoFS 数据出现`Invalid address`,需要在`/ets/hosts`中添加日志中出现的域名到 IP 的映射。

i18n/zh-CN/docusaurus-plugin-content-docs/version-4.x/faq/lakehouse-faq.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -279,6 +279,28 @@ ln -s /etc/pki/ca-trust/extracted/openssl/ca-bundle.trust.crt /etc/ssl/certs/ca-
279279
280280
如果 `session` 时区已经是 `Asia/Shanghai`,且查询仍然报错,说明生成 ORC 文件时的时区是 `+08:00`, 导致在读取时解析 `footer` 时需要用到 `+08:00` 时区,可以尝试在 `/usr/share/zoneinfo/` 目录下面软链到相同时区上。
281281
282+
14. 查询使用 JSON SerDe(如 `org.openx.data.jsonserde.JsonSerDe`)的 Hive 表时,报错:`failed to get schema` 或 `Storage schema reading not supported`
283+
284+
当 Hive 表使用 JSON 格式存储(ROW FORMAT SERDE 为 `org.openx.data.jsonserde.JsonSerDe`)时,Hive Metastore 可能无法通过默认方式读取表的 Schema 信息,导致 Doris 查询时报错:
285+
286+
```
287+
errCode = 2, detailMessage = failed to get schema for table xxx in db xxx.
288+
reason: org.apache.hadoop.hive.metastore.api.MetaException:
289+
java.lang.UnsupportedOperationException: Storage schema reading not supported
290+
```
291+
292+
可以在 Catalog 属性中添加 `"get_schema_from_table" = "true"` 解决,该参数会让 Doris 直接从 Hive 表的元数据中获取 Schema,而不依赖底层存储的 Schema Reader。
293+
294+
```sql
295+
CREATE CATALOG hive PROPERTIES (
296+
'type' = 'hms',
297+
'hive.metastore.uris' = 'thrift://x.x.x.x:9083',
298+
'get_schema_from_table' = 'true'
299+
);
300+
```
301+
302+
该参数自 2.1.10 和 3.0.6 版本支持。
303+
282304
## HDFS
283305
284306
1. 访问 HDFS 3.x 时报错:`java.lang.VerifyError: xxx`
@@ -353,6 +375,23 @@ ln -s /etc/pki/ca-trust/extracted/openssl/ca-bundle.trust.crt /etc/ssl/certs/ca-
353375
- 拷贝 `hdfs-site.xml` 以及 `core-site.xml` 到 `fe/conf` 和 `be/conf` 目录。(推荐)
354376
- 在 `hdfs-site.xml` 找到相应的配置 `dfs.data.transfer.protection`,并且在 catalog 里面设置该参数。
355377
378+
5. 查询 Hive Catalog 表时报错:`RPC response has a length of xxx exceeds maximum data length`
379+
380+
例如:
381+
382+
```
383+
RPC response has a length of 1213486160 exceeds maximum data length
384+
```
385+
386+
其中 `1213486160` 转换为十六进制为 `0x48545450`,对应 ASCII 字符串 `"HTTP"`。这说明 Doris FE 尝试连接 HDFS NameNode 的 RPC 端口时,实际收到了 HTTP 响应。
387+
388+
根本原因是 Catalog 中或 `hdfs-site.xml` 中配置的 HDFS NameNode 端口不正确——错误地使用了 HTTP 端口而非 RPC 端口。HDFS NameNode 通常暴露两种端口:
389+
390+
- **RPC 端口**(默认:`8020` 或 `9000`):用于 HDFS 客户端通信(Doris 应使用此端口)。
391+
- **HTTP 端口**(默认:`9870` 或 `50070`):用于 NameNode Web UI。
392+
393+
请检查 Catalog 属性或 `fe/conf`、`be/conf` 下 `hdfs-site.xml` 中的 HDFS NameNode 端口配置,确保使用的是 RPC 端口(`dfs.namenode.rpc-address`),而非 HTTP 端口(`dfs.namenode.http-address`)。
394+
356395
## DLF Catalog
357396
358397
1. 使用 DLF Catalog 时,BE 读在取 JindoFS 数据出现`Invalid address`,需要在`/ets/hosts`中添加日志中出现的域名到 IP 的映射。

versioned_docs/version-3.x/faq/lakehouse-faq.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -253,6 +253,28 @@ ln -s /etc/pki/ca-trust/extracted/openssl/ca-bundle.trust.crt /etc/ssl/certs/ca-
253253
254254
If the session timezone is already set to `Asia/Shanghai` but the query still fails, it indicates that the ORC file was generated with the timezone `+08:00`. During query execution, this timezone is required when parsing the ORC footer. In this case, you can try creating a symbolic link under the `/usr/share/zoneinfo/` directory that points `+08:00` to an equivalent timezone.
255255
256+
14. When querying a Hive table that uses JSON SerDe (e.g., `org.openx.data.jsonserde.JsonSerDe`), an error occurs: `failed to get schema` or `Storage schema reading not supported`
257+
258+
When a Hive table uses JSON format storage (ROW FORMAT SERDE is `org.openx.data.jsonserde.JsonSerDe`), the Hive Metastore may not be able to read the table's schema information through the default method, causing the following error when querying from Doris:
259+
260+
```
261+
errCode = 2, detailMessage = failed to get schema for table xxx in db xxx.
262+
reason: org.apache.hadoop.hive.metastore.api.MetaException:
263+
java.lang.UnsupportedOperationException: Storage schema reading not supported
264+
```
265+
266+
This can be resolved by adding `"get_schema_from_table" = "true"` in the Catalog properties. This parameter instructs Doris to retrieve the schema directly from the Hive table metadata instead of relying on the underlying storage's Schema Reader.
267+
268+
```sql
269+
CREATE CATALOG hive PROPERTIES (
270+
'type' = 'hms',
271+
'hive.metastore.uris' = 'thrift://x.x.x.x:9083',
272+
'get_schema_from_table' = 'true'
273+
);
274+
```
275+
276+
This parameter is supported since versions 2.1.10 and 3.0.6.
277+
256278
## HDFS
257279
258280
1. When accessing HDFS 3.x, if you encounter the error `java.lang.VerifyError: xxx`, in versions prior to 1.2.1, Doris depends on Hadoop version 2.8. You need to update to 2.10.2 or upgrade Doris to versions after 1.2.2.
@@ -322,6 +344,23 @@ ln -s /etc/pki/ca-trust/extracted/openssl/ca-bundle.trust.crt /etc/ssl/certs/ca-
322344
- Copy `hdfs-site.xml` and `core-site.xml` to `fe/conf` and `be/conf`. (Recommended)
323345
- In `hdfs-site.xml`, find the corresponding configuration `dfs.data.transfer.protection` and set this parameter in the catalog.
324346
347+
5. When querying a Hive Catalog table, an error occurs: `RPC response has a length of xxx exceeds maximum data length`
348+
349+
For example:
350+
351+
```
352+
RPC response has a length of 1213486160 exceeds maximum data length
353+
```
354+
355+
The value `1213486160` in hexadecimal is `0x48545450`, which corresponds to the ASCII string `"HTTP"`. This indicates that the Doris FE attempted to connect to an HDFS NameNode RPC port, but received an HTTP response instead.
356+
357+
The root cause is that the HDFS NameNode port configured in the Catalog or in `hdfs-site.xml` is incorrect — an HTTP port was used where an RPC port is required. HDFS NameNode typically exposes two types of ports:
358+
359+
- **RPC port** (default: `8020` or `9000`): Used for HDFS client communication (this is the correct port for Doris).
360+
- **HTTP port** (default: `9870` or `50070`): Used for the NameNode Web UI.
361+
362+
Check the HDFS NameNode port configuration in the Catalog properties or in `hdfs-site.xml` under `fe/conf` and `be/conf`, and ensure it is set to the RPC port (`dfs.namenode.rpc-address`), not the HTTP port (`dfs.namenode.http-address`).
363+
325364
## DLF Catalog
326365
327366
1. When using the DLF Catalog, if `Invalid address` occurs during BE reading JindoFS data, add the domain name appearing in the logs to IP mapping in `/etc/hosts`.

versioned_docs/version-4.x/faq/lakehouse-faq.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -253,6 +253,28 @@ ln -s /etc/pki/ca-trust/extracted/openssl/ca-bundle.trust.crt /etc/ssl/certs/ca-
253253
254254
If the session timezone is already set to `Asia/Shanghai` but the query still fails, it indicates that the ORC file was generated with the timezone `+08:00`. During query execution, this timezone is required when parsing the ORC footer. In this case, you can try creating a symbolic link under the `/usr/share/zoneinfo/` directory that points `+08:00` to an equivalent timezone.
255255
256+
14. When querying a Hive table that uses JSON SerDe (e.g., `org.openx.data.jsonserde.JsonSerDe`), an error occurs: `failed to get schema` or `Storage schema reading not supported`
257+
258+
When a Hive table uses JSON format storage (ROW FORMAT SERDE is `org.openx.data.jsonserde.JsonSerDe`), the Hive Metastore may not be able to read the table's schema information through the default method, causing the following error when querying from Doris:
259+
260+
```
261+
errCode = 2, detailMessage = failed to get schema for table xxx in db xxx.
262+
reason: org.apache.hadoop.hive.metastore.api.MetaException:
263+
java.lang.UnsupportedOperationException: Storage schema reading not supported
264+
```
265+
266+
This can be resolved by adding `"get_schema_from_table" = "true"` in the Catalog properties. This parameter instructs Doris to retrieve the schema directly from the Hive table metadata instead of relying on the underlying storage's Schema Reader.
267+
268+
```sql
269+
CREATE CATALOG hive PROPERTIES (
270+
'type' = 'hms',
271+
'hive.metastore.uris' = 'thrift://x.x.x.x:9083',
272+
'get_schema_from_table' = 'true'
273+
);
274+
```
275+
276+
This parameter is supported since versions 2.1.10 and 3.0.6.
277+
256278
## HDFS
257279
258280
1. When accessing HDFS 3.x, if you encounter the error `java.lang.VerifyError: xxx`, in versions prior to 1.2.1, Doris depends on Hadoop version 2.8. You need to update to 2.10.2 or upgrade Doris to versions after 1.2.2.
@@ -322,6 +344,23 @@ ln -s /etc/pki/ca-trust/extracted/openssl/ca-bundle.trust.crt /etc/ssl/certs/ca-
322344
- Copy `hdfs-site.xml` and `core-site.xml` to `fe/conf` and `be/conf`. (Recommended)
323345
- In `hdfs-site.xml`, find the corresponding configuration `dfs.data.transfer.protection` and set this parameter in the catalog.
324346
347+
5. When querying a Hive Catalog table, an error occurs: `RPC response has a length of xxx exceeds maximum data length`
348+
349+
For example:
350+
351+
```
352+
RPC response has a length of 1213486160 exceeds maximum data length
353+
```
354+
355+
The value `1213486160` in hexadecimal is `0x48545450`, which corresponds to the ASCII string `"HTTP"`. This indicates that the Doris FE attempted to connect to an HDFS NameNode RPC port, but received an HTTP response instead.
356+
357+
The root cause is that the HDFS NameNode port configured in the Catalog or in `hdfs-site.xml` is incorrect — an HTTP port was used where an RPC port is required. HDFS NameNode typically exposes two types of ports:
358+
359+
- **RPC port** (default: `8020` or `9000`): Used for HDFS client communication (this is the correct port for Doris).
360+
- **HTTP port** (default: `9870` or `50070`): Used for the NameNode Web UI.
361+
362+
Check the HDFS NameNode port configuration in the Catalog properties or in `hdfs-site.xml` under `fe/conf` and `be/conf`, and ensure it is set to the RPC port (`dfs.namenode.rpc-address`), not the HTTP port (`dfs.namenode.http-address`).
363+
325364
## DLF Catalog
326365
327366
1. When using the DLF Catalog, if `Invalid address` occurs during BE reading JindoFS data, add the domain name appearing in the logs to IP mapping in `/etc/hosts`.

0 commit comments

Comments
 (0)