Skip to content
This repository was archived by the owner on Jun 2, 2025. It is now read-only.

Commit 63e1dbf

Browse files
author
mahithsuresh
authored
Merge pull request #16 from jinpengqi/master
Add JSONLines Support to the SparkML Container
2 parents fda745a + 2926aa0 commit 63e1dbf

File tree

11 files changed

+598
-28
lines changed

11 files changed

+598
-28
lines changed

Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ WORKDIR /sagemaker-sparkml-model-server
99

1010
RUN mvn clean package
1111

12-
RUN cp ./target/sparkml-serving-2.3.jar /usr/local/lib/sparkml-serving-2.3.jar
12+
RUN cp ./target/sparkml-serving-2.4.jar /usr/local/lib/sparkml-serving-2.4.jar
1313
RUN cp ./serve.sh /usr/local/bin/serve.sh
1414

1515
RUN chmod a+x /usr/local/bin/serve.sh

README.md

Lines changed: 18 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -223,20 +223,20 @@ Calling `CreateModel` is required for creating a `Model` in SageMaker with this
223223
SageMaker works with Docker images stored in [Amazon ECR](https://aws.amazon.com/ecr/). SageMaker team has prepared and uploaded the Docker images for SageMaker SparkML Serving Container in all regions where SageMaker operates.
224224
Region to ECR container URL mapping can be found below. For a mapping from Region to Region Name, please see [here](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZones.html).
225225

226-
* us-west-1 = 746614075791.dkr.ecr.us-west-1.amazonaws.com/sagemaker-sparkml-serving:2.2
227-
* us-west-2 = 246618743249.dkr.ecr.us-west-2.amazonaws.com/sagemaker-sparkml-serving:2.2
228-
* us-east-1 = 683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-sparkml-serving:2.2
229-
* us-east-2 = 257758044811.dkr.ecr.us-east-2.amazonaws.com/sagemaker-sparkml-serving:2.2
230-
* ap-northeast-1 = 354813040037.dkr.ecr.ap-northeast-1.amazonaws.com/sagemaker-sparkml-serving:2.2
231-
* ap-northeast-2 = 366743142698.dkr.ecr.ap-northeast-2.amazonaws.com/sagemaker-sparkml-serving:2.2
232-
* ap-southeast-1 = 121021644041.dkr.ecr.ap-southeast-1.amazonaws.com/sagemaker-sparkml-serving:2.2
233-
* ap-southeast-2 = 783357654285.dkr.ecr.ap-southeast-2.amazonaws.com/sagemaker-sparkml-serving:2.2
234-
* ap-south-1 = 720646828776.dkr.ecr.ap-south-1.amazonaws.com/sagemaker-sparkml-serving:2.2
235-
* eu-west-1 = 141502667606.dkr.ecr.eu-west-1.amazonaws.com/sagemaker-sparkml-serving:2.2
236-
* eu-west-2 = 764974769150.dkr.ecr.eu-west-2.amazonaws.com/sagemaker-sparkml-serving:2.2
237-
* eu-central-1 = 492215442770.dkr.ecr.eu-central-1.amazonaws.com/sagemaker-sparkml-serving:2.2
238-
* ca-central-1 = 341280168497.dkr.ecr.ca-central-1.amazonaws.com/sagemaker-sparkml-serving:2.2
239-
* us-gov-west-1 = 414596584902.dkr.ecr.us-gov-west-1.amazonaws.com/sagemaker-sparkml-serving:2.2
226+
* us-west-1 = 746614075791.dkr.ecr.us-west-1.amazonaws.com/sagemaker-sparkml-serving:2.4
227+
* us-west-2 = 246618743249.dkr.ecr.us-west-2.amazonaws.com/sagemaker-sparkml-serving:2.4
228+
* us-east-1 = 683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-sparkml-serving:2.4
229+
* us-east-2 = 257758044811.dkr.ecr.us-east-2.amazonaws.com/sagemaker-sparkml-serving:2.4
230+
* ap-northeast-1 = 354813040037.dkr.ecr.ap-northeast-1.amazonaws.com/sagemaker-sparkml-serving:2.4
231+
* ap-northeast-2 = 366743142698.dkr.ecr.ap-northeast-2.amazonaws.com/sagemaker-sparkml-serving:2.4
232+
* ap-southeast-1 = 121021644041.dkr.ecr.ap-southeast-1.amazonaws.com/sagemaker-sparkml-serving:2.4
233+
* ap-southeast-2 = 783357654285.dkr.ecr.ap-southeast-2.amazonaws.com/sagemaker-sparkml-serving:2.4
234+
* ap-south-1 = 720646828776.dkr.ecr.ap-south-1.amazonaws.com/sagemaker-sparkml-serving:2.4
235+
* eu-west-1 = 141502667606.dkr.ecr.eu-west-1.amazonaws.com/sagemaker-sparkml-serving:2.4
236+
* eu-west-2 = 764974769150.dkr.ecr.eu-west-2.amazonaws.com/sagemaker-sparkml-serving:2.4
237+
* eu-central-1 = 492215442770.dkr.ecr.eu-central-1.amazonaws.com/sagemaker-sparkml-serving:2.4
238+
* ca-central-1 = 341280168497.dkr.ecr.ca-central-1.amazonaws.com/sagemaker-sparkml-serving:2.4
239+
* us-gov-west-1 = 414596584902.dkr.ecr.us-gov-west-1.amazonaws.com/sagemaker-sparkml-serving:2.4
240240

241241
With [SageMaker Python SDK](https://github.com/aws/sagemaker-python-sdk)
242242
------------------------------------------------------------------------
@@ -263,7 +263,7 @@ First you need to ensure that have installed [Docker](https://www.docker.com/) o
263263
In order to build the Docker image, you need to run a single Docker command:
264264

265265
```
266-
docker build -t sagemaker-sparkml-serving:2.2 .
266+
docker build -t sagemaker-sparkml-serving:2.4 .
267267
```
268268

269269
#### Running the image locally
@@ -272,7 +272,7 @@ In order to run the Docker image, you need to run the following command. Please
272272
The command will start the server on port 8080 and will also pass the schema as an environment variable to the Docker container. Alternatively, you can edit the `Dockerfile` to add `ENV SAGEMAKER_SPARKML_SCHEMA=schema` as well before building the Docker image.
273273

274274
```
275-
docker run -p 8080:8080 -e SAGEMAKER_SPARKML_SCHEMA=schema -v /tmp/model:/opt/ml/model sagemaker-sparkml-serving:2.2 serve
275+
docker run -p 8080:8080 -e SAGEMAKER_SPARKML_SCHEMA=schema -v /tmp/model:/opt/ml/model sagemaker-sparkml-serving:2.4 serve
276276
```
277277

278278
#### Invoking with a payload
@@ -287,7 +287,7 @@ or
287287
curl -i -H "content-type:application/json" -d "{\"data\":[feature_1,\"feature_2\",feature_3]}" http://localhost:8080/invocations
288288
```
289289

290-
The `Dockerfile` can be found at the root directory of the package. SageMaker SparkML Serving Container tags the Docker images using the Spark major version it is compatible with. Right now, it only supports Spark 2.2 and as a result, the Docker image is tagged with 2.2.
290+
The `Dockerfile` can be found at the root directory of the package. SageMaker SparkML Serving Container tags the Docker images using the Spark major version it is compatible with. Right now, it only supports Spark 2.4 and as a result, the Docker image is tagged with 2.4.
291291

292292
In order to save the effort of building the Docker image everytime you are making a code change, you can also install [Maven](http://maven.apache.org/) and run `mvn clean package` at your project root to verify if the code is compiling fine and unit tests are running without any issue.
293293

@@ -310,7 +310,7 @@ aws ecr get-login --region us-west-2 --registry-ids 246618743249 --no-include-em
310310
* Download the Docker image with the following command:
311311

312312
```
313-
docker pull 246618743249.dkr.ecr.us-west-2.amazonaws.com/sagemaker-sparkml-serving:2.2
313+
docker pull 246618743249.dkr.ecr.us-west-2.amazonaws.com/sagemaker-sparkml-serving:2.4
314314
```
315315

316316
For running the Docker image, please see the Running the image locally section from above.

ci/buildspec.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,10 +9,10 @@ phases:
99
commands:
1010
- echo Build started on `date`
1111
- echo Building the Docker image...
12-
- docker build -t sagemaker-sparkml-serving:2.3 .
13-
- docker tag sagemaker-sparkml-serving:2.3 515193369038.dkr.ecr.us-west-2.amazonaws.com/sagemaker-sparkml-serving:2.3
12+
- docker build -t sagemaker-sparkml-serving:2.4 .
13+
- docker tag sagemaker-sparkml-serving:2.4 515193369038.dkr.ecr.us-west-2.amazonaws.com/sagemaker-sparkml-serving:2.4
1414
post_build:
1515
commands:
1616
- echo Build completed on `date`
1717
- echo Pushing the Docker image...
18-
- docker push 515193369038.dkr.ecr.us-west-2.amazonaws.com/sagemaker-sparkml-serving:2.3
18+
- docker push 515193369038.dkr.ecr.us-west-2.amazonaws.com/sagemaker-sparkml-serving:2.4

pom.xml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@
2424
<modelVersion>4.0.0</modelVersion>
2525
<groupId>org.amazonaws.sagemaker</groupId>
2626
<artifactId>sparkml-serving</artifactId>
27-
<version>2.3</version>
27+
<version>2.4</version>
2828
<build>
2929
<plugins>
3030
<plugin>
@@ -154,7 +154,7 @@
154154
<dependency>
155155
<groupId>ml.combust.mleap</groupId>
156156
<artifactId>mleap-runtime_2.11</artifactId>
157-
<version>0.13.0</version>
157+
<version>0.14.0</version>
158158
</dependency>
159159
<dependency>
160160
<groupId>org.apache.commons</groupId>

serve.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
#!/bin/bash
22
# This is needed to make sure Java correctly detects CPU/Memory set by the container limits
3-
java -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -jar /usr/local/lib/sparkml-serving-2.3.jar
3+
java -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -jar /usr/local/lib/sparkml-serving-2.4.jar

src/main/java/com/amazonaws/sagemaker/controller/ServingController.java

Lines changed: 106 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@
2121

2222
import com.amazonaws.sagemaker.dto.BatchExecutionParameter;
2323
import com.amazonaws.sagemaker.dto.DataSchema;
24+
import com.amazonaws.sagemaker.dto.SageMakerRequestListObject;
2425
import com.amazonaws.sagemaker.dto.SageMakerRequestObject;
2526
import com.amazonaws.sagemaker.helper.DataConversionHelper;
2627
import com.amazonaws.sagemaker.helper.ResponseHelper;
@@ -29,11 +30,13 @@
2930
import com.amazonaws.sagemaker.utils.ScalaUtils;
3031
import com.amazonaws.sagemaker.utils.SystemUtils;
3132
import com.fasterxml.jackson.core.JsonProcessingException;
33+
import com.fasterxml.jackson.databind.JsonMappingException;
3234
import com.fasterxml.jackson.databind.ObjectMapper;
3335
import com.google.common.annotations.VisibleForTesting;
3436
import com.google.common.base.Preconditions;
3537
import com.google.common.collect.Lists;
3638
import java.io.IOException;
39+
import java.util.Arrays;
3740
import java.util.List;
3841
import ml.combust.mleap.runtime.frame.ArrayRow;
3942
import ml.combust.mleap.runtime.frame.DefaultLeapFrame;
@@ -102,7 +105,7 @@ public ResponseEntity returnBatchExecutionParameter() throws JsonProcessingExcep
102105
* Implements the invocations POST API for application/json input
103106
*
104107
* @param sro, the request object
105-
* @param accept, accept parameter from request
108+
* @param accept, indicates the content types that the http method is able to understand
106109
* @return ResponseEntity with body as the expected payload JSON & proper statuscode based on the input
107110
*/
108111
@RequestMapping(path = "/invocations", method = POST, consumes = MediaType.APPLICATION_JSON_VALUE)
@@ -123,10 +126,10 @@ public ResponseEntity<String> transformRequestJson(@RequestBody final SageMakerR
123126
}
124127

125128
/**
126-
* Implements the invocations POST API for application/json input
129+
* Implements the invocations POST API for text/csv input
127130
*
128131
* @param csvRow, data in row format in CSV
129-
* @param accept, accept parameter from request
132+
* @param accept, indicates the content types that the http method is able to understand
130133
* @return ResponseEntity with body as the expected payload JSON & proper statuscode based on the input
131134
*/
132135
@RequestMapping(path = "/invocations", method = POST, consumes = AdditionalMediaType.TEXT_CSV_VALUE)
@@ -148,6 +151,40 @@ public ResponseEntity<String> transformRequestCsv(@RequestBody final byte[] csvR
148151
}
149152
}
150153

154+
/**
155+
* Implements the invocations POST API for application/jsonlines input
156+
*
157+
* @param jsonLines, lines of json values
158+
* @param accept, indicates the content types that the http method is able to understand
159+
* @return ResponseEntity with body as the expected payload JSON & proper statuscode based on the input
160+
*/
161+
@RequestMapping(path = "/invocations", method = POST, consumes = AdditionalMediaType.APPLICATION_JSONLINES_VALUE)
162+
public ResponseEntity<String> transformRequestJsonLines(
163+
@RequestBody final byte[] jsonLines,
164+
@RequestHeader(value = HttpHeaders.ACCEPT, required = false)
165+
final String accept) {
166+
167+
if (jsonLines == null) {
168+
LOG.error("Input passed to the request is null");
169+
return ResponseEntity.badRequest().build();
170+
171+
} else if (jsonLines.length == 0) {
172+
173+
LOG.error("Input passed to the request is empty");
174+
return ResponseEntity.noContent().build();
175+
}
176+
177+
try {
178+
final String acceptVal = this.retrieveAndVerifyAccept(accept);
179+
return this.processInputDataForJsonLines(new String(jsonLines), acceptVal);
180+
181+
} catch (final Exception ex) {
182+
183+
LOG.error("Error in processing current request", ex);
184+
return ResponseEntity.badRequest().body(ex.getMessage());
185+
}
186+
}
187+
151188
@VisibleForTesting
152189
protected String retrieveAndVerifyAccept(final String acceptFromRequest) {
153190
final String acceptVal = checkEmptyAccept(acceptFromRequest) ? SystemUtils
@@ -181,6 +218,72 @@ private ResponseEntity<String> processInputData(final List<Object> inputData, fi
181218

182219
}
183220

221+
/**
222+
* Helper method to interpret the JSONLines input and return the response in the expected output format.
223+
*
224+
* @param jsonLinesAsString
225+
* The JSON lines input.
226+
*
227+
* @param acceptVal
228+
* The output format in which the response is to be returned.
229+
*
230+
* @return
231+
* The transformed output for the JSONlines input.
232+
*
233+
* @throws IOException
234+
* If there is an exception during object mapping and validation.
235+
*
236+
*/
237+
ResponseEntity<String> processInputDataForJsonLines(
238+
final String jsonLinesAsString, final String acceptVal) throws IOException {
239+
240+
final String lines[] = jsonLinesAsString.split("\\r?\\n");
241+
final ObjectMapper mapper = new ObjectMapper();
242+
243+
// first line is special since it could contain the schema as well. Extract the schema.
244+
final SageMakerRequestObject firstLine = mapper.readValue(lines[0], SageMakerRequestObject.class);
245+
final DataSchema schema = this.retrieveAndVerifySchema(firstLine.getSchema(), mapper);
246+
247+
List<List<Object>> inputDatas = Lists.newArrayList();
248+
249+
for(String jsonStringLine : lines) {
250+
try {
251+
252+
final SageMakerRequestListObject sro = mapper.readValue(jsonStringLine, SageMakerRequestListObject.class);
253+
254+
for(int idx = 0; idx < sro.getData().size(); ++idx) {
255+
inputDatas.add(sro.getData().get(idx));
256+
}
257+
258+
} catch (final JsonMappingException ex) {
259+
260+
final SageMakerRequestObject sro = mapper.readValue(jsonStringLine, SageMakerRequestObject.class);
261+
inputDatas.add(sro.getData());
262+
}
263+
}
264+
265+
List<ResponseEntity<String>> responseList = Lists.newArrayList();
266+
267+
// Process each input separately and add response to a list
268+
for (int idx = 0; idx < inputDatas.size(); ++idx) {
269+
responseList.add(this.processInputData(inputDatas.get(idx), schema, acceptVal));
270+
}
271+
272+
// Merge response body to a new output response
273+
List<List<String>> bodyList = Lists.newArrayList();
274+
275+
// All response should be valid if no exception got catch
276+
// which all headers should be the same and extract the first one to construct responseEntity
277+
HttpHeaders headers = responseList.get(0).getHeaders();
278+
279+
//combine body in responseList
280+
for (ResponseEntity<String> response: responseList) {
281+
bodyList.add(Lists.newArrayList(response.getBody()));
282+
}
283+
284+
return ResponseEntity.ok().headers(headers).body(bodyList.toString());
285+
}
286+
184287
private boolean checkEmptyAccept(final String acceptFromRequest) {
185288
//Spring may send the Accept as "*\/*" (star/star) in case accept is not passed via request
186289
return (StringUtils.isBlank(acceptFromRequest) || StringUtils.equals(acceptFromRequest, MediaType.ALL_VALUE));
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
/*
2+
* Copyright 2010-2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
3+
*
4+
* Licensed under the Apache License, Version 2.0 (the "License").
5+
* You may not use this file except in compliance with the License.
6+
* A copy of the License is located at
7+
*
8+
* http://aws.amazon.com/apache2.0
9+
*
10+
* or in the "license" file accompanying this file. This file is distributed
11+
* on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
12+
* express or implied. See the License for the specific language governing
13+
* permissions and limitations under the License.
14+
*
15+
*/
16+
17+
package com.amazonaws.sagemaker.dto;
18+
19+
import com.fasterxml.jackson.annotation.JsonCreator;
20+
import com.fasterxml.jackson.annotation.JsonProperty;
21+
import com.google.common.base.Preconditions;
22+
23+
import java.util.List;
24+
25+
/**
26+
* Request object POJO to which data field of input request in JSONLINES format will be mapped to by Spring (using Jackson).
27+
* For sample input, please see test/resources/com/amazonaws/sagemaker/dto
28+
*/
29+
public class SageMakerRequestListObject {
30+
31+
private DataSchema schema;
32+
private List<List<Object>> data;
33+
34+
@JsonCreator
35+
public SageMakerRequestListObject(@JsonProperty("schema") final DataSchema schema,
36+
@JsonProperty("data") final List<List<Object>> data) {
37+
// schema can be retrieved from environment variable as well, hence it is not enforced to be null
38+
this.schema = schema;
39+
this.data = Preconditions.checkNotNull(data);
40+
}
41+
42+
public DataSchema getSchema() {
43+
return schema;
44+
}
45+
46+
public List<List<Object>> getData() {
47+
return data;
48+
}
49+
}

0 commit comments

Comments
 (0)