-
Notifications
You must be signed in to change notification settings - Fork 309
fix: handle ambiguous and non-existent local times #3865
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
parthchandra
merged 7 commits into
apache:main
from
matthewalex4:fix-ambiguous-time-early
Apr 10, 2026
Merged
Changes from 3 commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
74f007a
fix: choose earliest time when ambiguous
a2dbf5e
fix: handle non-existent times due to time change
7358c7b
test: add unit tests for ambiguous and non-existent local times
d622e44
refactor: remove unnecessary catch_unwind in tests
8d6e3a3
Merge branch 'main' into fix-ambiguous-time-early
998f975
test: add spark test for timestampntz dst casting
76e3977
refactor: remove unwrap from resolve_local_datetime
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -36,7 +36,7 @@ use arrow::{ | |
| array::{as_dictionary_array, Array, ArrayRef, PrimitiveArray}, | ||
| temporal_conversions::as_datetime, | ||
| }; | ||
| use chrono::{DateTime, Offset, TimeZone}; | ||
| use chrono::{DateTime, LocalResult, NaiveDateTime, Offset, TimeDelta, TimeZone}; | ||
|
|
||
| /// Preprocesses input arrays to add timezone information from Spark to Arrow array datatype or | ||
| /// to apply timezone offset. | ||
|
|
@@ -174,6 +174,19 @@ fn datetime_cast_err(value: i64) -> ArrowError { | |
| )) | ||
| } | ||
|
|
||
| fn resolve_local_datetime(tz: &Tz, local_datetime: NaiveDateTime) -> DateTime<Tz> { | ||
| match tz.from_local_datetime(&local_datetime) { | ||
| LocalResult::Single(dt) => dt, | ||
| LocalResult::Ambiguous(dt, _) => dt, | ||
| LocalResult::None => { | ||
| // Interpret nonexistent local time by shifting from one hour earlier. | ||
| let shift = TimeDelta::hours(1); | ||
| let before = tz.from_local_datetime(&(local_datetime - shift)).unwrap(); | ||
| before + shift | ||
| } | ||
| } | ||
| } | ||
|
|
||
| /// Takes in a Timestamp(Microsecond, None) array and a timezone id, and returns | ||
| /// a Timestamp(Microsecond, Some<_>) array. | ||
| /// The understanding is that the input array has time in the timezone specified in the second | ||
|
|
@@ -196,8 +209,8 @@ fn timestamp_ntz_to_timestamp( | |
| as_datetime::<TimestampMicrosecondType>(value) | ||
| .ok_or_else(|| datetime_cast_err(value)) | ||
| .map(|local_datetime| { | ||
| let datetime: DateTime<Tz> = | ||
| tz.from_local_datetime(&local_datetime).unwrap(); | ||
| let datetime = resolve_local_datetime(&tz, local_datetime); | ||
|
|
||
| datetime.timestamp_micros() | ||
| }) | ||
| })?; | ||
|
|
@@ -215,8 +228,8 @@ fn timestamp_ntz_to_timestamp( | |
| as_datetime::<TimestampMillisecondType>(value) | ||
| .ok_or_else(|| datetime_cast_err(value)) | ||
| .map(|local_datetime| { | ||
| let datetime: DateTime<Tz> = | ||
| tz.from_local_datetime(&local_datetime).unwrap(); | ||
| let datetime = resolve_local_datetime(&tz, local_datetime); | ||
|
|
||
| datetime.timestamp_millis() | ||
| }) | ||
| })?; | ||
|
|
@@ -312,6 +325,19 @@ pub fn unlikely(b: bool) -> bool { | |
| mod tests { | ||
| use super::*; | ||
|
|
||
| fn array_containing(local_datetime: &str) -> ArrayRef { | ||
| let dt = NaiveDateTime::parse_from_str(local_datetime, "%Y-%m-%d %H:%M:%S").unwrap(); | ||
| let ts = dt.and_utc().timestamp_micros(); | ||
| Arc::new(TimestampMicrosecondArray::from(vec![ts])) | ||
| } | ||
|
|
||
| fn micros_for(datetime: &str) -> i64 { | ||
| NaiveDateTime::parse_from_str(datetime, "%Y-%m-%d %H:%M:%S") | ||
| .unwrap() | ||
| .and_utc() | ||
| .timestamp_micros() | ||
| } | ||
|
|
||
| #[test] | ||
| fn test_build_bool_state() { | ||
| let mut builder = BooleanBufferBuilder::new(0); | ||
|
|
@@ -330,4 +356,40 @@ mod tests { | |
| ); | ||
| assert_eq!(last, build_bool_state(&mut builder, &EmitTo::All)); | ||
| } | ||
|
|
||
| #[test] | ||
| fn test_timestamp_ntz_to_timestamp_handles_non_existent_time() { | ||
| let result = std::panic::catch_unwind(|| { | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do these tests still need
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No not necessary, removed in d622e44 |
||
| timestamp_ntz_to_timestamp( | ||
| array_containing("2024-03-31 01:30:00"), | ||
| "Europe/London", | ||
| None, | ||
| ) | ||
| }); | ||
|
|
||
| assert!(result.is_ok()); | ||
| let output = result.unwrap().unwrap(); | ||
| assert_eq!( | ||
| as_primitive_array::<TimestampMicrosecondType>(&output).value(0), | ||
| micros_for("2024-03-31 01:30:00") | ||
| ); | ||
| } | ||
|
|
||
| #[test] | ||
| fn test_timestamp_ntz_to_timestamp_handles_ambiguous_time() { | ||
| let result = std::panic::catch_unwind(|| { | ||
| timestamp_ntz_to_timestamp( | ||
| array_containing("2024-10-27 01:30:00"), | ||
| "Europe/London", | ||
| None, | ||
| ) | ||
| }); | ||
|
|
||
| assert!(result.is_ok()); | ||
| let output = result.unwrap().unwrap(); | ||
| assert_eq!( | ||
| as_primitive_array::<TimestampMicrosecondType>(&output).value(0), | ||
| micros_for("2024-10-27 00:30:00") | ||
| ); | ||
| } | ||
| } | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 1-hour shift works for standard DST transitions, but some timezones have non-standard gaps (e.g., Australia/Lord_Howe has a 30-minute DST transition). Yes, this is an edge case so it may be fine to ignore, but we should at least document this as an incompatibility.
https://www.timeanddate.com/time/change/australia/lord-howe-island
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment in d622e44
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm handling the DST gap in #3884. Perhaps you can wait for this to pass CI and then re-use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider doing something like in string.rs
parse_timestamp_to_micros. Something like -This eliminates the unwrap(), handles non-standard DST correctly, and reuses a pattern that already has test coverage in Comet.