Discussion
Jabil
US
Last activity: 11 Jun 2022 12:09 EDT
Repository rule definition design flaw
Greetings Community and Pega. I recently created a workaround to solve an S3 Repository connection issue. The Community has many (mostly unsolved) questions with similar traits, and I think I know why. I hope this will be picked up as a product enhancement.
The Repository rule specifies a "Root path". This is analogous to a file system root folder, but it is not literal. S3 is an object store, not a filesystem. Reality is expressed in S3 bucket policy, e.g. from Amazon's tutorials...
{
"Version": "2012-10-17",
"Id": "123",
"Statement": [
{
"Sid": "",
"Effect": "Deny",
"Principal": "*",
"Action": "s3:*",
"Resource": "arn:aws:s3:::DOC-EXAMPLE-BUCKET/taxdocuments/*",
"Condition": { "Null": { "aws:MultiFactorAuthAge": true }}
}
]
}
Turn that "Deny" into "Allow" to allow the caller to submit anything with a prefix of "taxdocuments". There is no folder, only an object Resource name. If there is anything that looks like a folder, it's a zero-byte file with that name. This object is not required for callers to write to names underneath, e.g. "taxdocuments/2022/return.pdf". S3 will not use consider the "folder" object at all when processing entries with that prefix.
In my case, an ETL process kept sweeping away the root folder. The Repository rule failed validation and the dependent Repository API Data Pages failed until the folder was restored manually. As a workaround I wrote a Function with code similar to this (it's still on the crude side)...
/**
* Function input String parameters...
*
* Path
* Bucket
* AccessKeyID
* SecretAccessKey
* KMSKeyID
* S3Region
*/
boolean retVal = false;
ObjectMetadata metadata = new ObjectMetadata();
metadata.setContentLength(0);
InputStream emptyContent = new ByteArrayInputStream(new byte[0]);
AWSCredentials credentials = new BasicAWSCredentials(AccessKeyID, SecretAccessKey);
AmazonS3 s3 = null;
try {
s3 = AmazonS3ClientBuilder.standard()
.withCredentials(new AWSStaticCredentialsProvider(credentials))
.withRegion(S3Region).build();
// Ensure ending slash for Pega's sake
Path = Path.endsWith("/") ? Path : Path + "/";
// Try up to 4 times to create the directory
for (int i = 0; i < 5; i++) {
// See if we have anything to do
if (s3.doesObjectExist(Bucket, Path)) {
retVal = true;
break;
}
try {
PutObjectRequest putObjectRequest = new PutObjectRequest(
Bucket, Path, emptyContent, metadata).withSSEAwsKeyManagementParams(new SSEAwsKeyManagementParams(KMSKeyID));
s3.putObject(putObjectRequest);
// Try to make sure S3 completes before returning to our caller
try {
Thread.sleep(250);
if (s3.doesObjectExist(Bucket, Path)) {
retVal = true;
break;
}
} catch(InterruptedException ie1) {
// Just return
oLog.error(ie1);
break;
}
} catch (AmazonClientException e) {
oLog.error(e);
if (i < 4) {
try {
Thread.sleep(250); // pause a bit before trying again
} catch(InterruptedException ie2) {
// Just continue
oLog.error(ie2);
break;
}
}
}
}
} finally {
s3.shutdown();
}
return retVal;
I call this just before utilizing the Repository API to deliver BIX results to their destination. In early testing it seems to be working, but there's a lot to hate. I'm making crude attempts to wait to confirm the asynchronous S3 call for "folder" creation actually completes before returning. Early testing confirms this is necessary - it's not instantaneous. The waiting piece obviously needs help.
Also, I'm unhappily forced to expose the S3 credential pieces outside of the Authentication and Repository rules. Note these are the same credentials used by the Repository rule, which validates after this Function succeeds.
The main point, though, is that I don't have an S3 connection issue. I have a Repository rule that inappropriately checks for object existence instead of purely testing permissions. If downstream Pega processes that mimic filesystems really demand the pseudo-folder exists, then the Repository rule should have an option to (re)create it on-demand. Alternatively, it could just gracefully simulate the existence of the root folder object without intruding into the bucket at all.
Hope this makes sense and is helpful to someone.