Conversation
There was a problem hiding this comment.
Code Review
This pull request prepares the project for version 0.2.0 by updating dependencies, Docker configurations, and installation scripts. Key changes include switching from tag-based to branch-based checkouts in the Dockerfile, adjusting GPU process counts in configurations, and introducing a TWINKLE_TRUST_REMOTE_CODE environment variable to control remote code execution during dataset loading. Review feedback identified critical logic errors in boolean environment variable parsing and missing os imports in src/twinkle/dataset/base.py, as well as a potential IndexError in the tensor conversion logic within src/twinkle/processor/base.py.
| """ | ||
|
|
||
| def __init__(self, dataset_meta: DatasetMeta, **kwargs): | ||
| trust_remote_code = bool(os.environ.get('TWINKLE_TRUST_REMOTE_CODE', '1')) |
There was a problem hiding this comment.
There are two issues with this line:
- Logic Error: In Python,
bool()on any non-empty string returnsTrue. Sinceos.environ.getreturns a string,bool("0")will beTrue, meaning the environment variable cannot be used to settrust_remote_codetoFalseas intended. - NameError: The
osmodule is not imported in this file (onlyos.pathis), so accessingos.environwill raise aNameErrorat runtime.
| trust_remote_code = bool(os.environ.get('TWINKLE_TRUST_REMOTE_CODE', '1')) | |
| trust_remote_code = os.environ.get('TWINKLE_TRUST_REMOTE_CODE', '1') == '1' |
| Args: | ||
| dataset_meta: The dataset_meta information of the loaded dataset. | ||
| """ | ||
| trust_remote_code = bool(os.environ.get('TWINKLE_TRUST_REMOTE_CODE', '1')) |
There was a problem hiding this comment.
Same issues as in the constructor: the bool() conversion on a string does not work for boolean flags, and the os module is missing from the imports.
| trust_remote_code = bool(os.environ.get('TWINKLE_TRUST_REMOTE_CODE', '1')) | |
| trust_remote_code = os.environ.get('TWINKLE_TRUST_REMOTE_CODE', '1') == '1' |
| elif (isinstance(value, list) and isinstance(value[0], | ||
| (int, float, np.number))) or key == 'position_ids': |
There was a problem hiding this comment.
Accessing value[0] without verifying that the list is non-empty will raise an IndexError if value is []. It is safer to check the list's truthiness (which is False for empty lists) before indexing.
| elif (isinstance(value, list) and isinstance(value[0], | |
| (int, float, np.number))) or key == 'position_ids': | |
| elif (isinstance(value, list) and value and isinstance(value[0], | |
| (int, float, np.number))) or key == 'position_ids': |
PR type
PR information
Write the detail information belongs to this PR.
Experiment results
Paste your experiment result here(if needed).