如何在 Node.js 中从 S3 getObject 获取响应？

2023-11-26

在 Node.js 项目中，我尝试从 S3 获取数据。

当我使用getSignedURL，一切正常：

aws.getSignedUrl('getObject', params, function(err, url){
    console.log(url); 
});

我的参数是：

var params = {
              Bucket: "test-aws-imagery", 
              Key: "TILES/Level4/A3_B3_C2/A5_B67_C59_Tiles.par"

如果我将 URL 输出到控制台并将其粘贴到 Web 浏览器中，它就会下载我需要的文件。

但是，如果我尝试使用getObject我有各种奇怪的行为。我相信我只是错误地使用它。这是我尝试过的：

aws.getObject(params, function(err, data){
    console.log(data); 
    console.log(err); 
});

输出：

{ 
  AcceptRanges: 'bytes',
  LastModified: 'Wed, 06 Apr 2016 20:04:02 GMT',
  ContentLength: '1602862',
  ETag: '9826l1e5725fbd52l88ge3f5v0c123a4"',
  ContentType: 'application/octet-stream',
  Metadata: {},
  Body: <Buffer 01 00 00 00  ... > }

  null

所以看来这工作正常。但是，当我在其中一个上设置断点时console.logs，我的 IDE (NetBeans) 抛出错误并拒绝显示数据的值。虽然这可能只是 IDE，但我决定尝试其他方式来使用getObject.

aws.getObject(params).on('httpData', function(chunk){
    console.log(chunk); 
}).on('httpDone', function(data){
    console.log(data); 
});

这不会输出任何内容。放置断点表明代码从未到达任何一个console.logs。我也尝试过：

aws.getObject(params).on('success', function(data){
    console.log(data); 
});

但是，这也不会输出任何内容，并且放置断点表明console.log永远不会达到。

我究竟做错了什么？

@aws-sdk/client-s3（2022 更新）

自从我在 2016 年写下这个答案以来，亚马逊发布了一个新的 JavaScript SDK，@aws-sdk/client-s3。这个新版本在原来的基础上进行了改进getObject()通过始终返回承诺而不是通过选择加入.promise()被束缚于getObject()。在此之上，response.Body不再是一个Buffer但是，其中之一Readable|ReadableStream|Blob。这改变了处理response.Data一点点。这应该会提高性能，因为我们可以流式传输返回的数据，而不是将所有内容保存在内存中，但代价是实现起来有点冗长。

在下面的示例中response.Body数据将被流式传输到数组中，然后作为字符串返回。这是我原来答案的等效示例。或者，response.Body可以用stream.Readable.pipe()HTTP 响应、文件或任何其他类型stream.Writeable为了进一步使用，在获取大型对象时，这将是更高效的方法。

如果你想使用Buffer，像原来的一样getObject()响应，这可以通过包装来完成responseDataChunks in a Buffer.concat()而不是使用Array#join()，这在与二进制数据交互时非常有用。需要注意的是，自从Array#join()返回一个字符串，每个Buffer实例在responseDataChunks将会有Buffer.toString()隐式调用和默认编码utf8将会被使用。

const { GetObjectCommand, S3Client } = require('@aws-sdk/client-s3')
const client = new S3Client() // Pass in opts to S3 if necessary

function getObject (Bucket, Key) {
  return new Promise(async (resolve, reject) => {
    const getObjectCommand = new GetObjectCommand({ Bucket, Key })

    try {
      const response = await client.send(getObjectCommand)
  
      // Store all of data chunks returned from the response data stream 
      // into an array then use Array#join() to use the returned contents as a String
      let responseDataChunks = []

      // Handle an error while streaming the response body
      response.Body.once('error', err => reject(err))
  
      // Attach a 'data' listener to add the chunks of data to our array
      // Each chunk is a Buffer instance
      response.Body.on('data', chunk => responseDataChunks.push(chunk))
  
      // Once the stream has no more data, join the chunks into a string and return the string
      response.Body.once('end', () => resolve(responseDataChunks.join('')))
    } catch (err) {
      // Handle the error or throw
      return reject(err)
    } 
  })
}

使用评论`Readable.toArray()`

Using Readable.toArray()与直接处理流事件相比，使用起来可能更方便，但性能更差。它的工作原理是在继续之前将所有响应数据块读入内存。由于这消除了流式传输的所有好处，因此 Node.js 文档不鼓励使用这种方法。

由于此方法将整个流读入内存，因此它否定了流的优点。它旨在实现互操作性和便利性，而不是作为使用流的主要方式。文档链接

`@aws-sdk/client-s3`文档链接

GetObjectCommand
GetObjectCommandInput
GetObjectCommandOutput

aws-sdk（原始答案）

当做一个getObject()从 S3 API，根据docs您的文件内容位于Body属性，您可以从示例输出中看到该属性。您的代码应该类似于以下内容

const aws = require('aws-sdk');
const s3 = new aws.S3(); // Pass in opts to S3 if necessary

var getParams = {
    Bucket: 'abc', // your bucket name,
    Key: 'abc.txt' // path to the object you're looking for
}

s3.getObject(getParams, function(err, data) {
    // Handle any error and exit
    if (err)
        return err;

  // No error happened
  // Convert Body from a Buffer to a String
  let objectData = data.Body.toString('utf-8'); // Use the encoding necessary
});

您可能不需要从创建新的缓冲区data.Body对象，但如果您需要，可以使用上面的示例来实现。

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)