-
Notifications
You must be signed in to change notification settings - Fork 488
JAMES-4156 refactoring Blobstore deleted message vault for appending messages into single bucket #2902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
JAMES-4156 refactoring Blobstore deleted message vault for appending messages into single bucket #2902
Changes from 9 commits
6f18cc9
4591193
51f4491
f53bf5d
4f8649f
1533006
eba0223
205cbc6
d6195e9
a8e419b
febcdf0
e21ce2b
07768b0
1cc7a87
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -31,32 +31,38 @@ | |
| import com.google.common.base.Preconditions; | ||
|
|
||
| public class VaultConfiguration { | ||
| public static final String DEFAULT_SINGLE_BUCKET_NAME = "james-deleted-message-vault"; | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will this allow nesting below root ?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No the bucket is (by default)
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think using the term Bucket or bucket name for the blobstore was a mistake. it overlaps in a bad way with the s3 notions. Here you are manipulating a blobstore bucket. On a more direct musing I think the previous design influences the naming too much here, the term
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
+1 But as said CF previous refactoring that may not be something easy that we are willing to change.
I don't want this to mess up with mailbox GC. I'd rather have a separate blob store bucket if given the choice. Also a separate blob store bucket makes loads of sense as the storage constraints are not the standard james ones (glacier?)
Cf comment above.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I thought we agreed that this work for now just allows going from multiple S3 buckets to just one for the vault only, but it does not put all blobs under the root bucket. The goal in the end would be to put all blobs under one S3 bucket yes. But that's not what this PR intends to do. That would require more changes and discussions. Is it still unclear on that point @jeantil ?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
no that's perfectly clear
This can be achieved today using the I think the constant should not contain the word As I understand it this is the Hence my proposal to call it
I'm not sure how my proposed name change or how using the prefix/namespace features of blob.properties changes would afffect the mailbox GC ... I propose we discuss this with actual examples on the ADR. This goes beyond my intention for the PR. |
||
| public static final VaultConfiguration DEFAULT = | ||
| new VaultConfiguration(false, ChronoUnit.YEARS.getDuration(), DefaultMailboxes.RESTORED_MESSAGES); | ||
| new VaultConfiguration(false, ChronoUnit.YEARS.getDuration(), DefaultMailboxes.RESTORED_MESSAGES, DEFAULT_SINGLE_BUCKET_NAME); | ||
| public static final VaultConfiguration ENABLED_DEFAULT = | ||
| new VaultConfiguration(true, ChronoUnit.YEARS.getDuration(), DefaultMailboxes.RESTORED_MESSAGES); | ||
| new VaultConfiguration(true, ChronoUnit.YEARS.getDuration(), DefaultMailboxes.RESTORED_MESSAGES, DEFAULT_SINGLE_BUCKET_NAME); | ||
|
|
||
| public static VaultConfiguration from(Configuration propertiesConfiguration) { | ||
| Duration retentionPeriod = Optional.ofNullable(propertiesConfiguration.getString("retentionPeriod")) | ||
| .map(string -> DurationParser.parse(string, ChronoUnit.DAYS)) | ||
| .orElse(DEFAULT.getRetentionPeriod()); | ||
| String restoreLocation = Optional.ofNullable(propertiesConfiguration.getString("restoreLocation")) | ||
| .orElse(DEFAULT.getRestoreLocation()); | ||
| String singleBucketName = Optional.ofNullable(propertiesConfiguration.getString("singleBucketName")) | ||
| .orElse(DEFAULT.getSingleBucketName()); | ||
| boolean enabled = propertiesConfiguration.getBoolean("enabled", false); | ||
| return new VaultConfiguration(enabled, retentionPeriod, restoreLocation); | ||
| return new VaultConfiguration(enabled, retentionPeriod, restoreLocation, singleBucketName); | ||
| } | ||
|
|
||
| private final boolean enabled; | ||
| private final Duration retentionPeriod; | ||
| private final String restoreLocation; | ||
| private final String singleBucketName; | ||
|
|
||
| VaultConfiguration(boolean enabled, Duration retentionPeriod, String restoreLocation) { | ||
| VaultConfiguration(boolean enabled, Duration retentionPeriod, String restoreLocation, String singleBucketName) { | ||
| this.enabled = enabled; | ||
| Preconditions.checkNotNull(retentionPeriod); | ||
| Preconditions.checkNotNull(restoreLocation); | ||
| Preconditions.checkNotNull(singleBucketName); | ||
|
|
||
| this.retentionPeriod = retentionPeriod; | ||
| this.restoreLocation = restoreLocation; | ||
| this.singleBucketName = singleBucketName; | ||
| } | ||
|
|
||
| public boolean isEnabled() { | ||
|
|
@@ -71,20 +77,25 @@ public String getRestoreLocation() { | |
| return restoreLocation; | ||
| } | ||
|
|
||
| public String getSingleBucketName() { | ||
| return singleBucketName; | ||
| } | ||
|
|
||
| @Override | ||
| public final boolean equals(Object o) { | ||
| if (o instanceof VaultConfiguration) { | ||
| VaultConfiguration that = (VaultConfiguration) o; | ||
|
|
||
| return Objects.equals(this.retentionPeriod, that.retentionPeriod) | ||
| && Objects.equals(this.restoreLocation, that.restoreLocation) | ||
| && Objects.equals(this.enabled, that.enabled); | ||
| && Objects.equals(this.enabled, that.enabled) | ||
| && Objects.equals(this.singleBucketName, that.singleBucketName); | ||
| } | ||
| return false; | ||
| } | ||
|
|
||
| @Override | ||
| public final int hashCode() { | ||
| return Objects.hash(retentionPeriod, restoreLocation, enabled); | ||
| return Objects.hash(retentionPeriod, restoreLocation, enabled, singleBucketName); | ||
| } | ||
| } | ||
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,80 @@ | ||||||||
| /**************************************************************** | ||||||||
| * Licensed to the Apache Software Foundation (ASF) under one * | ||||||||
| * or more contributor license agreements. See the NOTICE file * | ||||||||
| * distributed with this work for additional information * | ||||||||
| * regarding copyright ownership. The ASF licenses this file * | ||||||||
| * to you under the Apache License, Version 2.0 (the * | ||||||||
| * "License"); you may not use this file except in compliance * | ||||||||
| * with the License. You may obtain a copy of the License at * | ||||||||
| * * | ||||||||
| * http://www.apache.org/licenses/LICENSE-2.0 * | ||||||||
| * * | ||||||||
| * Unless required by applicable law or agreed to in writing, * | ||||||||
| * software distributed under the License is distributed on an * | ||||||||
| * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY * | ||||||||
| * KIND, either express or implied. See the License for the * | ||||||||
| * specific language governing permissions and limitations * | ||||||||
| * under the License. * | ||||||||
| ****************************************************************/ | ||||||||
|
|
||||||||
| package org.apache.james.vault.blob; | ||||||||
|
|
||||||||
| import java.time.Clock; | ||||||||
| import java.time.LocalDate; | ||||||||
| import java.time.ZonedDateTime; | ||||||||
| import java.util.Optional; | ||||||||
| import java.util.UUID; | ||||||||
| import java.util.regex.Matcher; | ||||||||
| import java.util.regex.Pattern; | ||||||||
|
|
||||||||
| import jakarta.inject.Inject; | ||||||||
|
|
||||||||
| import org.apache.james.blob.api.BlobId; | ||||||||
| import org.apache.james.blob.api.PlainBlobId; | ||||||||
|
|
||||||||
| public class BlobIdTimeGenerator { | ||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't understand why the whole PR goes around the BlobId.Factory design 🤷
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please re read the interface. BlobId.Factory is about *parsing* BlobId This work is about *generating* them. Maybe I am missing something, despite investing several hours on the topic. Maybe a hand on example could help?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe I completely miss the point of BlobId.Factory. My understanding is that the Factory is responsible for creating instances of BlobId that point to a specific object (generate BlobIds), either by wrapping/decorating In this case you are encoding custom paths as a PlainBlobId bypassing the factory which is not consistent with how things are done in the rest of the codebase. I was kinda expecting a TimePrefixedBlobIdFactory (don't focus on this specific name) which would encapsulate the path manipulation. I may misunderstand by your path manipulation appears to be very similar to what is done in
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hello @jeantil
I see no generate method. Generation is an actual responsibility of the BlobStore, or of its caller. Proof:
At the minimum a concrete clarification would be a Please note that in the example taken: The decision to rely on a MailKey as the storage identifier is not ecoded in the BlobId.Factory either.
TBH as we are not actively using the date information encoded in the object name this present little added value to operate of a "typed" blobId. The only use of information encoded by the blobId to a useful process is to my knowledge the GC to leverage a grace period cf Line 277 in aef61d6
Also my consern about "typed" BlobId is that it interoperate badly. The beauty of the proposed design here (PlainBloId factory) is that we naturally handle historical data without needing some complex fallback logic for an information that is not actually needed. The So how do we deal with this:
Cheers. |
||||||||
| public static final String BLOB_ID_GENERATING_FORMAT = "%d/%02d/%s"; | ||||||||
| public static final Pattern BLOB_ID_TIME_PATTERN = Pattern.compile("^(\\d{4})/(\\d{2})/(.*)$"); | ||||||||
|
|
||||||||
| private final BlobId.Factory blobIdFactory; | ||||||||
| private final Clock clock; | ||||||||
|
|
||||||||
| @Inject | ||||||||
| public BlobIdTimeGenerator(BlobId.Factory blobIdFactory, Clock clock) { | ||||||||
| this.blobIdFactory = blobIdFactory; | ||||||||
| this.clock = clock; | ||||||||
| } | ||||||||
|
|
||||||||
| BlobId currentBlobId() { | ||||||||
| ZonedDateTime now = ZonedDateTime.now(clock); | ||||||||
| int month = now.getMonthValue(); | ||||||||
| int year = now.getYear(); | ||||||||
|
|
||||||||
| return new PlainBlobId(String.format(BLOB_ID_GENERATING_FORMAT, year, month, blobIdFactory.of(UUID.randomUUID().toString()).asString())); | ||||||||
| } | ||||||||
|
|
||||||||
| Optional<ZonedDateTime> blobIdEndTime(BlobId blobId) { | ||||||||
| return Optional.of(BLOB_ID_TIME_PATTERN.matcher(blobId.asString())) | ||||||||
| .filter(Matcher::matches) | ||||||||
| .map(matcher -> { | ||||||||
| int year = Integer.parseInt(matcher.group(1)); | ||||||||
| int month = Integer.parseInt(matcher.group(2)); | ||||||||
| return firstDayOfNextMonth(year, month); | ||||||||
| }); | ||||||||
| } | ||||||||
|
|
||||||||
| private ZonedDateTime firstDayOfNextMonth(int year, int month) { | ||||||||
| return LocalDate.of(year, month, 1).plusMonths(1).atStartOfDay(clock.getZone()); | ||||||||
| } | ||||||||
|
|
||||||||
| public BlobId toDeletedMessageBlobId(String blobId) { | ||||||||
| return Optional.of(BLOB_ID_TIME_PATTERN.matcher(blobId)) | ||||||||
| .filter(Matcher::matches) | ||||||||
| .map(matcher -> { | ||||||||
| int year = Integer.parseInt(matcher.group(1)); | ||||||||
| int month = Integer.parseInt(matcher.group(2)); | ||||||||
| String subBlobId = matcher.group(3); | ||||||||
| return (BlobId) new PlainBlobId(String.format(BLOB_ID_GENERATING_FORMAT, year, month, blobIdFactory.parse(subBlobId).asString())); | ||||||||
| }).orElseGet(() -> blobIdFactory.parse(blobId)); | ||||||||
| } | ||||||||
| } | ||||||||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not have a custom BlobId.Factory implementation here to avoid changing the API and affecting the cassandra variant for a S3 only problem ?
check https://github.com/apache/james-project/blob/master/server/mailrepository/mailrepository-blob/src/main/scala/org/apache/james/mailrepository/blob/BlobMailRepositoryFactory.scala#L29 for an example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @jeantil
-> I wonder if this piece of coude could not work with PlainBlobId as the cassandra implem is blobId transparent. That's a technical nitpick that do not address the core of your remarks but could be a nice simplification I presume. Worth testing...
On the principle mostly agree. This would mean acting at the blobStoreDAO level and not on the BlobStore. Which do not seem like a major issue to me.
I;ll try to push a fix in this direction.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 not sure I follow : it does require creating a custom blobstore instance but it has no impact on the blobstoreDAO itself. This is how we did it for the blob backed mailrepository
It could arguably be managed at the injection layer by using a Qualified blobid factory instance ( this is a limitation of the BlobMailRepositoryFactory which forces usage of a passthrough blobstore and cannot use deduplication but I was so happy to have something working I stopped without the deduplication, also deduplication makes less sense for a mail repository which I don't expect to store mails for years )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I see
So re-assembling a ad-hoc blobStore tailor made for the feature.
Interesting.
Sorry
do like ...comment was hard for me to interprete.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my apologies, I should have been clearer