Wagtail 文档：大文件（>2GB）上传失败

2024-03-19

我正在尝试使用 Wagtail 应用程序中内置的 wagtaildocs 应用程序上传文件。我已经使用 Nginx 的 Digital Ocean 教程方法设置了 Ubuntu 16.04 服务器 |鳐鱼 | Postgres

一些初步澄清：

在我的 Nginx 配置中我已经设置了client_max_body_size 10000M;
在我的生产设置中，我有以下几行：MAX_UPLOAD_SIZE = "5242880000" WAGTAILIMAGES_MAX_UPLOAD_SIZE = 5000 * 1024 * 1024
我的文件类型是.zip
这是此时的生产测试。我只实现了一个基本的 wagtail 应用程序，没有附加模块。

因此，只要我的文件大小低于 10Gb，从配置的角度来看我应该没问题，除非我遗漏了某些内容或对拼写错误视而不见。

我已经尝试将所有配置值调整为不合理的大值。我尝试过使用其他文件扩展名，但没有改变我的错误。

我认为这与会话期间关闭 TCP 或 SSL 连接有关。我以前从未遇到过这个问题，所以我希望得到一些帮助。

这是我的错误消息：

Internal Server Error: /admin/documents/multiple/add/
Traceback (most recent call last):
  File "/Users/wgarlock/Git/wagtaildev/lib/python3.7/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
psycopg2.DatabaseError: SSL SYSCALL error: Operation timed out


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/wgarlock/Git/wagtaildev/lib/python3.7/site-packages/django/core/handlers/exception.py", line 34, in inner
    response = get_response(request)
  File "/Users/wgarlock/Git/wagtaildev/lib/python3.7/site-packages/django/core/handlers/base.py", line 115, in _get_response
    response = self.process_exception_by_middleware(e, request)
  File "/Users/wgarlock/Git/wagtaildev/lib/python3.7/site-packages/django/core/handlers/base.py", line 113, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/Users/wgarlock/Git/wagtaildev/lib/python3.7/site-packages/django/views/decorators/cache.py", line 44, in _wrapped_view_func
    response = view_func(request, *args, **kwargs)
  File "/Users/wgarlock/Git/wagtaildev/lib/python3.7/site-packages/wagtail/admin/urls/__init__.py", line 102, in wrapper
    return view_func(request, *args, **kwargs)
  File "/Users/wgarlock/Git/wagtaildev/lib/python3.7/site-packages/wagtail/admin/decorators.py", line 34, in decorated_view
    return view_func(request, *args, **kwargs)
  File "/Users/wgarlock/Git/wagtaildev/lib/python3.7/site-packages/wagtail/admin/utils.py", line 151, in wrapped_view_func
    return view_func(request, *args, **kwargs)
  File "/Users/wgarlock/Git/wagtaildev/lib/python3.7/site-packages/django/views/decorators/vary.py", line 20, in inner_func
    response = func(*args, **kwargs)
  File "/Users/wgarlock/Git/wagtaildev/lib/python3.7/site-packages/wagtail/documents/views/multiple.py", line 60, in add
    doc.save()
  File "/Users/wgarlock/Git/wagtaildev/lib/python3.7/site-packages/django/db/models/base.py", line 741, in save
    force_update=force_update, update_fields=update_fields)
  File "/Users/wgarlock/Git/wagtaildev/lib/python3.7/site-packages/django/db/models/base.py", line 779, in save_base
    force_update, using, update_fields,
  File "/Users/wgarlock/Git/wagtaildev/lib/python3.7/site-packages/django/db/models/base.py", line 870, in _save_table
    result = self._do_insert(cls._base_manager, using, fields, update_pk, raw)
  File "/Users/wgarlock/Git/wagtaildev/lib/python3.7/site-packages/django/db/models/base.py", line 908, in _do_insert
    using=using, raw=raw)
  File "/Users/wgarlock/Git/wagtaildev/lib/python3.7/site-packages/django/db/models/manager.py", line 82, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/Users/wgarlock/Git/wagtaildev/lib/python3.7/site-packages/django/db/models/query.py", line 1186, in _insert
    return query.get_compiler(using=using).execute_sql(return_id)
  File "/Users/wgarlock/Git/wagtaildev/lib/python3.7/site-packages/django/db/models/sql/compiler.py", line 1335, in execute_sql
    cursor.execute(sql, params)
  File "/Users/wgarlock/Git/wagtaildev/lib/python3.7/site-packages/django/db/backends/utils.py", line 99, in execute
    return super().execute(sql, params)
  File "/Users/wgarlock/Git/wagtaildev/lib/python3.7/site-packages/django/db/backends/utils.py", line 67, in execute
    return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
  File "/Users/wgarlock/Git/wagtaildev/lib/python3.7/site-packages/django/db/backends/utils.py", line 76, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "/Users/wgarlock/Git/wagtaildev/lib/python3.7/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
  File "/Users/wgarlock/Git/wagtaildev/lib/python3.7/site-packages/django/db/utils.py", line 89, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/Users/wgarlock/Git/wagtaildev/lib/python3.7/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
django.db.utils.DatabaseError: SSL SYSCALL error: Operation timed out

这是我的设置

### base.py ###
import os

PROJECT_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
BASE_DIR = os.path.dirname(PROJECT_DIR)
SECRET_KEY = os.getenv('SECRET_KEY_WAGTAILDEV')

# Quick-start development settings - unsuitable for production
# See https://docs.djangoproject.com/en/2.2/howto/deployment/checklist/


# Application definition

INSTALLED_APPS = [
    'home',
    'search',

    'wagtail.contrib.forms',
    'wagtail.contrib.redirects',
    'wagtail.embeds',
    'wagtail.sites',
    'wagtail.users',
    'wagtail.snippets',
    'wagtail.documents',
    'wagtail.images',
    'wagtail.search',
    'wagtail.admin',
    'wagtail.core',

    'modelcluster',
    'taggit',

    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    'storages',
]

MIDDLEWARE = [
    'django.contrib.sessions.middleware.SessionMiddleware',
    'django.middleware.common.CommonMiddleware',
    'django.middleware.csrf.CsrfViewMiddleware',
    'django.contrib.auth.middleware.AuthenticationMiddleware',
    'django.contrib.messages.middleware.MessageMiddleware',
    'django.middleware.clickjacking.XFrameOptionsMiddleware',
    'django.middleware.security.SecurityMiddleware',

    'wagtail.core.middleware.SiteMiddleware',
    'wagtail.contrib.redirects.middleware.RedirectMiddleware',
]

ROOT_URLCONF = 'wagtaildev.urls'

TEMPLATES = [
    {
        'BACKEND': 'django.template.backends.django.DjangoTemplates',
        'DIRS': [
            os.path.join(PROJECT_DIR, 'templates'),
        ],
        'APP_DIRS': True,
        'OPTIONS': {
            'context_processors': [
                'django.template.context_processors.debug',
                'django.template.context_processors.request',
                'django.contrib.auth.context_processors.auth',
                'django.contrib.messages.context_processors.messages',
            ],
        },
    },
]

WSGI_APPLICATION = 'wagtaildev.wsgi.application'


# Database
# https://docs.djangoproject.com/en/2.2/ref/settings/#databases

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.postgresql_psycopg2',
        'HOST': os.getenv('DATABASE_HOST_WAGTAILDEV'),
        'USER': os.getenv('DATABASE_USER_WAGTAILDEV'),
        'PASSWORD': os.getenv('DATABASE_PASSWORD_WAGTAILDEV') ,
        'NAME': os.getenv('DATABASE_NAME_WAGTAILDEV'),
        'PORT': '5432',
    }
}


# Password validation
# https://docs.djangoproject.com/en/2.2/ref/settings/#auth-password-validators

AUTH_PASSWORD_VALIDATORS = [
    {
        'NAME': 'django.contrib.auth.password_validation.UserAttributeSimilarityValidator',
    },
    {
        'NAME': 'django.contrib.auth.password_validation.MinimumLengthValidator',
    },
    {
        'NAME': 'django.contrib.auth.password_validation.CommonPasswordValidator',
    },
    {
        'NAME': 'django.contrib.auth.password_validation.NumericPasswordValidator',
    },
]


# Internationalization
# https://docs.djangoproject.com/en/2.2/topics/i18n/

LANGUAGE_CODE = 'en-us'

TIME_ZONE = 'UTC'

USE_I18N = True

USE_L10N = True

USE_TZ = True


# Static files (CSS, JavaScript, Images)
# https://docs.djangoproject.com/en/2.2/howto/static-files/

STATICFILES_FINDERS = [
    'django.contrib.staticfiles.finders.FileSystemFinder',
    'django.contrib.staticfiles.finders.AppDirectoriesFinder',
]

STATICFILES_DIRS = [
    os.path.join(PROJECT_DIR, 'static'),
]

# ManifestStaticFilesStorage is recommended in production, to prevent outdated
# Javascript / CSS assets being served from cache (e.g. after a Wagtail upgrade).
# See https://docs.djangoproject.com/en/2.2/ref/contrib/staticfiles/#manifeststaticfilesstorage
STATICFILES_STORAGE = 'django.contrib.staticfiles.storage.ManifestStaticFilesStorage'

STATIC_ROOT = os.path.join(BASE_DIR, 'static')
STATIC_URL = '/static/'

MEDIA_ROOT = os.path.join(BASE_DIR, 'media')
MEDIA_URL = '/media/'


# Wagtail settings

WAGTAIL_SITE_NAME = "wagtaildev"

# Base URL to use when referring to full URLs within the Wagtail admin backend -
# e.g. in notification emails. Don't include '/admin' or a trailing slash
BASE_URL = 'http://example.com'

### production.py ###

from .base import *

DEBUG = True

ALLOWED_HOSTS = ['wagtaildev.wesgarlock.com', '127.0.0.1','134.209.230.125']

from wagtaildev.aws.conf import *

EMAIL_BACKEND = 'django.core.mail.backends.console.EmailBackend'

MAX_UPLOAD_SIZE = "5242880000"
WAGTAILIMAGES_MAX_UPLOAD_SIZE = 5000 * 1024 * 1024
FILE_UPLOAD_TEMP_DIR = str(os.path.join(BASE_DIR, 'tmp'))

这是我的 Nginx 设置

server {
    listen 80;

    server_name wagtaildev.wesgarlock.com;
    client_max_body_size 10000M;

    location = /favicon.ico { access_log off; log_not_found off; }

    location / {
        include proxy_params;
        proxy_pass http://unix:/home/wesgarlock/run/wagtaildev.sock;
    }
}

我从来没能直接解决这个问题，但我确实想出了一个办法来解决它。

我不是 Wagtail 或 Django 专家，所以我确信这个答案有一个正确的解决方案，但无论如何，这就是我所做的。如果您有任何改进建议，请随时发表评论。

作为一个注释，这实际上是提醒我我做了什么的文档。此时（05-25-19）有很多冗余代码行，因为我弗兰肯斯坦将很多代码放在一起。我会加班编辑它。

以下是弗兰肯斯坦中共同创建此解决方案的教程。

https://www.codingforentrepreneurs.com/blog/large-file-uploads-with-amazon-s3-django/ https://www.codingforentrepreneurs.com/blog/large-file-uploads-with-amazon-s3-django/
http://docs.wagtail.io/en/v2.1.1/advanced_topics/documents/custom_document_model.html http://docs.wagtail.io/en/v2.1.1/advanced_topics/documents/custom_document_model.html
https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html
https://medium.com/faun/summary-667d0fdbcdae https://medium.com/faun/summary-667d0fdbcdae
http://docs.aws.amazon.com/sdk-for-javascript/v2/developer-guide/loading-browser-credentials-federated-id.html http://docs.aws.amazon.com/sdk-for-javascript/v2/developer-guide/loading-browser-credentials-federated-id.html
https://kite.com/python/examples/454/threading-wait-for-a-thread-to-finish https://kite.com/python/examples/454/threading-wait-for-a-thread-to-finish
http://docs.celeryproject.org/en/latest/userguide/daemonizing.html#usage-systemd http://docs.celeryproject.org/en/latest/userguide/daemonizing.html#usage-systemd

可能还有其他一些原则，但这些是原则。

好的，我们开始吧。

我创建一个名为“files”的应用程序，然后使用自定义文档对 models.py 文件进行建模。您需要在设置文件中指定 WAGTAILDOCS_DOCUMENT_MODEL = 'files.LargeDocument'。我这样做的唯一原因是为了更明确地跟踪我正在改变的行为。这个自定义文档模型只是扩展了 Wagtail 中的标准文档模型。

#models.py

from django.db import models
from wagtail.documents.models import AbstractDocument
from wagtail.admin.edit_handlers import FieldPanel
# Create your models here.
class LargeDocument(AbstractDocument):

    admin_form_fields = (
        'file',
    )
    panels = [
        FieldPanel('file', classname='fn'),
    ]

接下来，您需要创建一个包含以下内容的 wagtail_hook.py 文件。

#wagtail_hook.py
from wagtail.contrib.modeladmin.options import (
    ModelAdmin, modeladmin_register)
from .models import LargeDocument
from .views import LargeDocumentAdminView


class LargeDocumentAdmin(ModelAdmin):
    model = LargeDocument

    menu_label = 'Large Documents'  # ditch this to use verbose_name_plural from model
    menu_icon = 'pilcrow'  # change as required
    menu_order = 200  # will put in 3rd place (000 being 1st, 100 2nd)
    add_to_settings_menu = False  # or True to add your model to the Settings sub-menu
    exclude_from_explorer = False # or True to exclude pages of this type from Wagtail's explorer view

    create_template_name ='large_document_index.html'

# Now you just need to register your customised ModelAdmin class with Wagtail
modeladmin_register(LargeDocumentAdmin)

这允许您做两件事：

创建一个新的菜单项用于上传大型文档，同时维护标准文档菜单项及其标准功能。
指定用于处理大型上传的自定义 html 文件。

这是html

{% extends "wagtailadmin/base.html" %}
{% load staticfiles cache %}
{% load static wagtailuserbar %}
{% load compress %}
{% load underscore_hyphan_to_space %}
{% load url_vars %}
{% load pagination_value %}

{% load static %}
{% load i18n %}

{% block titletag %}{{ view.page_title }}{% endblock %}

{% block content %}

    {% include "wagtailadmin/shared/header.html" with title=view.page_title icon=view.header_icon %}
          <!-- Google Signin Button -->
          <div class="g-signin2" data-onsuccess="onSignIn" data-theme="dark">
          </div>
          <!-- Select the file to upload -->

          <div class="input-group mb-3">
            <link rel="stylesheet" href="{% static 'css/input.css'%}"/>
            <div class="custom-file">
              <input type="file" class="custom-file-input" id="file" name="file">
              <label id="file_label" class="custom-file-label" style="width:auto!important;" for="inputGroupFile02" aria-describedby="inputGroupFileAddon02">Choose file</label>
            </div>
            <div class="input-group-append">
              <span class="input-group-text" id="file_submission_button">Upload</span>
            </div>
            <div id="start_progress"></div>
          </div>
          <div class="progress-upload">
            <div class="progress-upload-bar" role="progressbar" style="width: 100%;" aria-valuenow="100" aria-valuemin="0" aria-valuemax="100"></div>
          </div>
{% endblock %}

{% block extra_js %}
    {{ block.super }}
    {{ form.media.js }}
    <script src="https://apis.google.com/js/platform.js" async defer></script>
    <script src="https://sdk.amazonaws.com/js/aws-sdk-2.148.0.min.js"></script>
    <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.2.1/jquery.min.js"></script>
    <script src="{% static 'js/awsupload.js' %}"></script>
{% endblock %}

{% block extra_css %}
    {{ block.super }}
    {{ form.media.css }}
    <meta name="google-signin-client_id" content="847336061839-9h651ek1dv7u1i0t4edsk8pd20d0lkf3.apps.googleusercontent.com">
    <link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.3.1/css/bootstrap.min.css" integrity="sha384-ggOyR0iXCbMQv3Xipma34MD+dH/1fQ784/j6cY/iJTQUOhcWr7x9JvoRxT2MZw1T" crossorigin="anonymous">

{% endblock %}

然后我在views.py中创建了一些对象

#views.py
from django.shortcuts import render

# Create your views here.
import base64
import hashlib
import hmac
import os
import time
from rest_framework import permissions, status, authentication
from rest_framework.response import Response
from rest_framework.views import APIView
from .config_aws import (
    AWS_UPLOAD_BUCKET,
    AWS_UPLOAD_REGION,
    AWS_UPLOAD_ACCESS_KEY_ID,
    AWS_UPLOAD_SECRET_KEY
)
from .models import LargeDocument
import datetime
from wagtail.contrib.modeladmin.views import WMABaseView
from django.db.models.fields.files import FieldFile
from django.core.files import File
import urllib.request
from django.core.mail import send_mail
from .tasks import file_creator

class FilePolicyAPI(APIView):
    """
    This view is to get the AWS Upload Policy for our s3 bucket.
    What we do here is first create a LargeDocument object instance in our
    Django backend. This is to include the LargeDocument instance in the path
    we will use within our bucket as you'll see below.
    """
    permission_classes = [permissions.IsAuthenticated]
    authentication_classes = [authentication.SessionAuthentication]

    def post(self, request, *args, **kwargs):
        """
        The initial post request includes the filename
        and auth credientails. In our case, we'll use
        Session Authentication but any auth should work.
        """
        filename_req = request.data.get('filename')
        if not filename_req:
                return Response({"message": "A filename is required"}, status=status.HTTP_400_BAD_REQUEST)
        policy_expires = int(time.time()+5000)
        user = request.user
        username_str = str(request.user.username)
        """
        Below we create the Django object. We'll use this
        in our upload path to AWS.

        Example:
        To-be-uploaded file's name: Some Random File.mp4
        Eventual Path on S3: <bucket>/username/2312/2312.mp4
        """
        doc_obj = LargeDocument.objects.create(uploaded_by_user=user, )
        doc_obj_id = doc_obj.id
        doc_obj.title=filename_req
        upload_start_path = "{location}".format(
                    location = "LargeDocuments/",
            )
        file_extension = os.path.splitext(filename_req)
        filename_final = "{title}".format(
                    title= filename_req,
                )
        """
        Eventual file_upload_path includes the renamed file to the
        Django-stored LargeDocument instance ID. Renaming the file is
        done to prevent issues with user generated formatted names.
        """
        final_upload_path = "{upload_start_path}/{filename_final}".format(
                                 upload_start_path=upload_start_path,
                                 filename_final=filename_final,
                            )
        if filename_req and file_extension:
            """
            Save the eventual path to the Django-stored LargeDocument instance
            """
            policy_document_context = {
                "expire": policy_expires,
                "bucket_name": AWS_UPLOAD_BUCKET,
                "key_name": "",
                "acl_name": "public-read",
                "content_name": "",
                "content_length": 524288000,
                "upload_start_path": upload_start_path,

                }
            policy_document = """
            {"expiration": "2020-01-01T00:00:00Z",
              "conditions": [
                {"bucket": "%(bucket_name)s"},
                ["starts-with", "$key", "%(upload_start_path)s"],
                {"acl": "public-read"},

                ["starts-with", "$Content-Type", "%(content_name)s"],
                ["starts-with", "$filename", ""],
                ["content-length-range", 0, %(content_length)d]
              ]
            }
            """ % policy_document_context
            aws_secret = str.encode(AWS_UPLOAD_SECRET_KEY)
            policy_document_str_encoded = str.encode(policy_document.replace(" ", ""))
            url = 'https://thearchmedia.s3.amazonaws.com/'
            policy = base64.b64encode(policy_document_str_encoded)
            signature = base64.b64encode(hmac.new(aws_secret, policy, hashlib.sha1).digest())
            doc_obj.file_hash = signature
            doc_obj.path = final_upload_path

            doc_obj.save()



        data = {
            "policy": policy,
            "signature": signature,
            "key": AWS_UPLOAD_ACCESS_KEY_ID,
            "file_bucket_path": upload_start_path,
            "file_id": doc_obj_id,
            "filename": filename_final,
            "url": url,
            "username": username_str,
        }
        return Response(data, status=status.HTTP_200_OK)

class FileUploadCompleteHandler(APIView):
    permission_classes = [permissions.IsAuthenticated]
    authentication_classes = [authentication.SessionAuthentication]

    def post(self, request, *args, **kwargs):
        file_id = request.POST.get('file')
        size = request.POST.get('fileSize')
        data = {}
        type_ = request.POST.get('fileType')
        if file_id:
            obj = LargeDocument.objects.get(id=int(file_id))
            obj.size = int(size)
            obj.uploaded = True
            obj.type = type_
            obj.file_hash
            obj.save()
            data['id'] = obj.id
            data['saved'] = True
            data['url']=obj.url
        return Response(data, status=status.HTTP_200_OK)

class ModelFileCompletion(APIView):
    permission_classes = [permissions.IsAuthenticated]
    authentication_classes = [authentication.SessionAuthentication]

    def post(self, request, *args, **kwargs):
        file_id = request.POST.get('file')
        url = request.POST.get('aws_url')
        data = {}
        if file_id:
            obj = LargeDocument.objects.get(id=int(file_id))
            file_creator.delay(obj.pk)
            data['test'] = 'process started'
        return Response(data, status=status.HTTP_200_OK)

def LargeDocumentAdminView(request):
    context = super(WMABaseView, self).get_context(request)
    render(request, 'modeladmin/files/index.html', context)

该视图围绕标准文件处理系统。我不想放弃标准文件处理系统或编写一个新的系统。这就是为什么我称此黑客为非理想解决方案的原因。

// javascript upload file "awsupload.js"
var id_token; //token we get upon Authentication with Web Identiy Provider
function onSignIn(googleUser) {
  var profile = googleUser.getBasicProfile();
  // The ID token you need to pass to your backend:
  id_token = googleUser.getAuthResponse().id_token;
}

$(document).ready(function(){

  // setup session cookie data. This is Django-related
  function getCookie(name) {
      var cookieValue = null;
      if (document.cookie && document.cookie !== '') {
          var cookies = document.cookie.split(';');
          for (var i = 0; i < cookies.length; i++) {
              var cookie = jQuery.trim(cookies[i]);
              // Does this cookie string begin with the name we want?
              if (cookie.substring(0, name.length + 1) === (name + '=')) {
                  cookieValue = decodeURIComponent(cookie.substring(name.length + 1));
                  break;
              }
          }
      }
      return cookieValue;
  }
  var csrftoken = getCookie('csrftoken');
  function csrfSafeMethod(method) {
      // these HTTP methods do not require CSRF protection
      return (/^(GET|HEAD|OPTIONS|TRACE)$/.test(method));
  }
  $.ajaxSetup({
      beforeSend: function(xhr, settings) {
          if (!csrfSafeMethod(settings.type) && !this.crossDomain) {
              xhr.setRequestHeader("X-CSRFToken", csrftoken);
          }
      }
  });
  // end session cookie data setup.

  // declare an empty array for potential uploaded files
  var fileItemList = []

  $(document).on('click','#file_submission_button', function(event){
      var selectedFiles = $('#file').prop('files');
      formItem = $(this).parent()
      $.each(selectedFiles, function(index, item){
          uploadFile(item)
      })
      $(this).val('');
      $('.progress-upload-bar').attr('aria-valuenow',progress);
      $('.progress-upload-bar').attr('width',progress.toString()+'%');
      $('.progress-upload-bar').attr('style',"width:"+progress.toString()+'%');
      $('.progress-upload-bar').text(progress.toString()+'%');
  })
  $(document).on('change','#file', function(event){
      var selectedFiles = $('#file').prop('files');
      $('#file_label').text(selectedFiles[0].name)
  })



  function constructFormPolicyData(policyData, fileItem) {
     var contentType = fileItem.type != '' ? fileItem.type : 'application/octet-stream'
      var url = policyData.url
      var filename = policyData.filename
      var repsonseUser = policyData.user
      // var keyPath = 'www/' + repsonseUser + '/' + filename
      var keyPath = policyData.file_bucket_path
      var fd = new FormData()
      fd.append('key', keyPath + filename);
      fd.append('acl','private');
      fd.append('Content-Type', contentType);
      fd.append("AWSAccessKeyId", policyData.key)
      fd.append('Policy', policyData.policy);
      fd.append('filename', filename);
      fd.append('Signature', policyData.signature);
      fd.append('file', fileItem);
      return fd
  }

  function fileUploadComplete(fileItem, policyData){
      data = {
          uploaded: true,
          fileSize: fileItem.size,
          file: policyData.file_id,

      }
      $.ajax({
          method:"POST",
          data: data,
          url: "/api/files/complete/",
          success: function(data){
              displayItems(fileItemList)
          },
          error: function(jqXHR, textStatus, errorThrown){
              alert("An error occured, please refresh the page.")
          }
      })
  }

  function modelComplete(policyData, aws_url){
      data = {
          file: policyData.file_id,
          aws_url: aws_url
      }
      $.ajax({
          method:"POST",
          data: data,
          url: "/api/files/modelcomplete/",
          success:
          console.log('model complete success')  ,
          error: function(jqXHR, textStatus, errorThrown){
              alert("An error occured, please refresh the page.")
          }
      })
  }

  function displayItems(fileItemList){
      var itemList = $('.item-loading-queue')
      itemList.html("")
      $.each(fileItemList, function(index, obj){
          var item = obj.file
          var id_ = obj.id
          var order_ = obj.order
          var html_ = "<div class=\"progress\">" +
            "<div class=\"progress-bar\" role=\"progressbar\" style='width:" + item.progress + "%' aria-valuenow='" + item.progress + "' aria-valuemin=\"0\" aria-valuemax=\"100\"></div></div>"
          itemList.append("<div>" + order_ + ") " + item.name + "<a href='#' class='srvup-item-upload float-right' data-id='" + id_ + ")'>X</a> <br/>" + html_ + "</div><hr/>")

      })
  }

  function uploadFile(fileItem){
          var policyData;
          var newLoadingItem;
          // get AWS upload policy for each file uploaded through the POST method
          // Remember we're creating an instance in the backend so using POST is
          // needed.
          $.ajax({
              method:"POST",
              data: {
                  filename: fileItem.name
              },
              url: "/api/files/policy/",
              success: function(data){
                      policyData = data
              },
              error: function(data){
                  alert("An error occured, please try again later")
              }
          }).done(function(){
              // construct the needed data using the policy for AWS
              var file = fileItem;
              AWS.config.credentials = new AWS.WebIdentityCredentials({
                  RoleArn: 'arn:aws:iam::120974195102:role/thearchmedia-google-role',
                  ProviderId: null, // this is null for Google
                  WebIdentityToken: id_token // Access token from identity provider
              });
              var bucket = 'thearchmedia'
              var key = 'LargeDocuments/'+file.name
              var aws_url = 'https://'+bucket+'.s3.amazonaws.com/'+ key
              var s3bucket = new AWS.S3({params: {Bucket: bucket}});
              var params = {Key: key , ContentType: file.type, Body: file, ACL:'public-read', };
              s3bucket.upload(params, function (err, data) {
                  $('#results').html(err ? 'ERROR!' : 'UPLOADED :' + data.Location);
                }).on(
                  'httpUploadProgress', function(evt) {
                    progress = parseInt((evt.loaded * 100) / evt.total)
                    $('.progress-upload-bar').attr('aria-valuenow',progress)
                    $('.progress-upload-bar').attr('width',progress.toString()+'%')
                    $('.progress-upload-bar').attr('style',"width:"+progress.toString()+'%')
                    $('.progress-upload-bar').text(progress.toString()+'%')

                  }).send(
                    function(err, data) {
                      alert("File uploaded successfully.")
                      fileUploadComplete(fileItem, policyData)
                      modelComplete(policyData, aws_url)
                    });
          })
  }


})

.js 和 .view.py 交互说明

首先，头部带有文件信息的 Ajax 调用会创建 Document 对象，但由于文件从不接触服务器，因此不会在 Document 对象中创建“File”对象。这个“文件”对象包含了我需要的功能，所以我需要做更多的事情。接下来，我的 javascript 文件使用 AWS Javascript SDK 将文件上传到我的 s3 存储桶。 SDK 中的 s3bucket.upload() 函数足够强大，可以上传高达 5GB 的文件，但如果不包括一些其他修改，它可以上传高达 5TB（aws 限制）。文件上传到 s3 存储桶后，将进行最终的 API 调用。最终的 API 调用会触发 Celery 任务，将文件下载到远程服务器上的临时目录。一旦文件存在于我的远程服务器上，就会创建文件对象并将其保存到文档模型中。

task.py 文件处理将文件从 S3 存储桶下载到远程服务器，然后创建 File 对象并将其保存到文档文件。

#task.py
from .models import LargeDocument
from celery import shared_task
import urllib.request
from django.core.mail import send_mail
from django.core.files import File
import threading

@shared_task
def file_creator(pk_num):
    obj = LargeDocument.objects.get(pk=pk_num)
    tmp_loc = 'tmp/'+ obj.title
    def downloadit():
        urllib.request.urlretrieve('https://thearchmedia.s3.amazonaws.com/LargeDocuments/' + obj.title, tmp_loc)

    def after_dwn():
         dwn_thread.join()           #waits till thread1 has completed executing
         #next chunk of code after download, goes here
         send_mail(
             obj.title + ' has finished to downloading to the server',
             obj.title + 'Downloaded to server',
             '[email protected] /cdn-cgi/l/email-protection',
             ['[email protected] /cdn-cgi/l/email-protection'],
             fail_silently=False,
         )
         reopen = open(tmp_loc, 'rb')
         django_file = File(reopen)
         obj.file = django_file
         obj.save()
         send_mail(
             obj.title + ' has finished to downloading to the server',
             'File Model Created for' + obj.title,
             '[email protected] /cdn-cgi/l/email-protection',
             ['[email protected] /cdn-cgi/l/email-protection'],
             fail_silently=False,
         )

    dwn_thread = threading.Thread(target=downloadit)
    dwn_thread.start()

    metadata_thread = threading.Thread(target=after_dwn)
    metadata_thread.start()

这个过程需要在 Celery 中运行，因为下载大文件需要时间，而且我不想在浏览器打开的情况下等待。此task.py 内部还有一个python thread()，它强制进程等待，直到文件成功下载到远程服务器。如果您是 Celery 新手，这里是他们文档的开始（http://docs.celeryproject.org/en/master/getting-started/introduction.html http://docs.celeryproject.org/en/master/getting-started/introduction.html)

此外，我还添加了一些电子邮件通知，以确认流程已完成。

最后一点是，我在项目中创建了一个 /tmp 目录，并设置了每日删除所有文件的操作，以赋予其 tmp 功能。

crontab -e
find ~/thearchmedia/tmp -mtime +1 -delete

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)

Wagtail 文档：大文件（>2GB）上传失败的相关文章

定义Python源代码编码的正确方法

PEP 263 http www python org dev peps pep 0263 定义如何声明Python源代码编码通常 Python 文件的前两行应以以下内容开头 usr bin python coding
如何使用 Python boto3 获取 redshift 中的列名称

我想使用 python boto3 获取 redshift 中的列名称创建Redshift集群将数据插入其中配置的机密管理器配置 SageMaker 笔记本打开Jupyter Notebook写入以下代码 import boto3
使用 Python 3 动态插入到 sqlite

我想使用 sqlite 写入多个表但我不想提前手动指定查询有数十种可能的排列例如 def insert sqlite tablename data list global dbc dbc execute insert into tab
如何获取numpy.random.choice的索引？ - Python

是否可以修改 numpy random choice 函数以使其返回所选元素的索引基本上我想创建一个列表并随机选择元素而不进行替换 import numpy as np gt gt gt a 1 4 1 3 3 2 1 4 gt gt
python celery -A 的无效值无法加载应用程序

我有一个以下项目目录 azima init py main py tasks py task py from main import app app task def add x y return x y app task def mul
如何使用 Homebrew 在 Mac 上安装 Python 2 和 3？

我需要能够在 Python 2 和 3 之间来回切换我如何使用 Homebrew 来做到这一点因为我不想弄乱路径并陷入麻烦现在我已经通过 Homebrew 安装了 2 7 我会用pyenv https github com yyuu
Python“非规范化”unicode 组合字符

我正在寻找标准化 python 中的一些 unicode 文本我想知道是否有一种简单的方法可以在 python 中获得组合 unicode 字符的非规范化形式例如如果我有序列u o xaf i e latin small lette
django orm：select_lated，用假外键欺骗反向外键除了模型之外，会出现什么问题？

我正在尝试学习如何使用 Django 的 ORM 进行更高级的查询而不是使用原始 sql select related进行连接以减少数据库命中原则上它可以进行我手动执行的连接但有一个问题它不使用反向外键关系来制作sql 对于我的架构
时间序列数据预处理 - numpy strides 技巧以节省内存

我正在预处理一个时间序列数据集将其形状从二维数据点特征更改为三维数据点时间窗口特征在这样的视角中时间窗口有时也称为回顾指示作为输入变量来预测下一个时间段的先前时间步长数据点的数量换句话说时间窗口是机器学习算法在对
为什么在Python解释器中输入_会返回True？ [复制]

这个问题在这里已经有答案了我的翻译行为非常奇怪 gt gt gt True gt gt gt type True
使 Django 内置 send_mail 函数默认使用 html

我想替换内置发送邮件功能仅适用于纯文本电子邮件用我自己的智能发送邮件函数自动生成 html 和纯文本版本一切都按我自己的电子邮件的预期进行在我自己的应用程序中定义我可以在views py中以这种方式做到这一点 from djan
无法创建超级用户 Django

我假设这是因为我的超级用户依赖于还没有现有数据的 UserProfile 我的模型看起来像 from django db import models from django contrib auth models import User f
跨应用程序使用 Django 模型？

因此在我的 Django 项目中我有几个不同的应用程序每个应用程序都有自己的模型视图模板等让这些应用程序进行通信的好方法 Django 方式是什么一个具体的例子是一个会议应用程序它有一个会议模型我有一个家庭应用程序我想
使用 selenium 和 python 来提取 javascript 生成的 HTML？萤火虫？

这里是Python新手我遇到的是数据收集问题我在这个网站上当我用 Firebug 检查我想要的元素时它显示了包含我需要的信息的源然而常规源代码没有 Firebug 不会给我这个信息这意味着我也无法通过正常的 selenium
numpy polyfit 中使用的权重值是多少以及拟合误差是多少

我正在尝试对 numpy 中的某些数据进行线性拟合 Ex 其中 w 是该值的样本数即对于点 x 0 y 0 我只有 1 个测量值该测量值是2 2 但对于这一点 1 1 我有 2 个测量值值为3 5 x np array 0 1 2 3
PyQt5：如何使QThread返回数据到主线程

I am a PyQt 5 4 1 1初学者我的Python是3 4 3 这是我尝试遵循的many https mayaposch wordpress com 2011 11 01 how to really truly use qthr
高效创建抗锯齿圆形蒙版

我正在尝试创建抗锯齿加权而不是布尔圆形掩模以制作用于卷积的圆形内核 radius 3 no of pixels to be 1 on either side of the center pixel shall be decimal a
将时间添加到日期时间

我有一个像这样的日期字符串然后使用strptime 所以就像这样 my time datetime datetime strptime 07 05 15 m d Y 现在我想添加 23 小时 59 分钟my time 我努力了 timed
Jupyter Notebook：带有小部件的交互式绘图

我正在尝试生成一个依赖于小部件的交互式绘图我遇到的问题是当我使用滑块更改参数时会在前一个绘图之后完成一个新绘图而我预计只有一个绘图会根据参数发生变化 Example from ipywidgets import interact i
Python 中的 Unix cat 函数 (cat * > merged.txt)？ [复制]

这个问题在这里已经有答案了一旦建立了目录有没有办法在Python中使用Unix中的cat函数或类似的函数我想将 files 1 3 合并到 merged txt 我通常会在 Unix 中找到该目录然后运行 cat gt merged

随机推荐

在 null 上调用成员函数 store() - laravel 5.4

我正在尝试上传图像但每次提交时都会返回 store 错误我已将表单设置为 enctype multipart form data 这没有帮助有人能指出我正确的方向吗 Thanks 控制器内部功能 public function sto
来自 Android 的 Facebook Score API 调用未在时间轴/股票代码上显示高分

我正在尝试让 Android 应用程序将高分发布到 Facebook 类似于 Facebook 上的愤怒的小鸟的做法它显示在时间轴上也显示在股票代码中请记住该游戏仅在 Android 上运行并且没有 FB Canvas 应用程
GiST 和 GIN 索引之间的区别

我正在实现一个表其中有一列的数据类型为tsvector我想了解什么索引更好使用 GIN 还是 GiST 在浏览中postgres 文档在这里 http www postgresql org docs 9 1 static textsear
模拟安全警报的解决方案 - X509TrustManager 的不安全实现

因此最近我在开发人员控制台中收到以下警告为了解决该问题我已完成了所需的修复根据谷歌的建议 here https support google com faqs answer 6346016 要确认您已进行正确的更改请将应用程序的更
CouchDB 备份和克隆数据库

我们正在寻找 CouchdDB 作为类似 CMS 的应用程序围绕备份我们的生产数据库有哪些常见模式最佳实践和工作流程建议我对克隆数据库以用于开发和测试的过程特别感兴趣仅从实时运行的实例下复制磁盘上的文件就足够了吗您可以在两个实时运
TabLayout 使用自定义视图更新选项卡内容

我在用着TabLayout新的材料设计我有一个问题创建选项卡后我无法更新自定义视图的选项卡内容我可以用以下方法简化 PagerAdapter 中的方法 public View setTabView int position boole
记录器服务错误：鼠标左键按下：无法找到匹配的元素 - Xcode 错误

我正在尝试通过 XCTest 自动化我的 mac 应用程序当尝试从 XCode 记录应用程序时我收到以下错误消息当我点击按钮时会发生这种情况按钮层次结构是按钮 gt 堆栈视图 gt NSView 这里 button是NSButto
外键和索引问题

我正在使用 SQL Server 2008 Enterprise 我有一个表其中一个列引用另一个表在同一个数据库中中的另一列作为外键这是相关的SQL语句更详细地说表 Foo 中的列 AnotherID 引用了另一个表表 Goo
如何使用 sass 正确避免在 HTML 上嵌入 twitter bootstrap 类名

我正在开发一个刚刚开始的 Rails 项目我们想使用 twitter bootstrap 作为我们样式的基础一开始我们只是直接在 HTML 代码上使用 bootstrap 的类名就像 bootstrap 的文档中所示但在阅读以下文章
如何检查 Python 数组中是否存在某个元素（相当于 PHP in_array）？

我是 Python 新手我正在寻找一个标准函数来告诉我数组中是否存在某个元素我找到了index方法但如果未找到该元素则会抛出异常我只需要一些可以返回的简单函数true如果该元素在数组中或者false if not 基本上相当于 P
hook_user()：将额外的字段插入数据库而不仅仅是表单

我可以在注册中添加一个额外的字段我需要知道的是我需要采取什么步骤来获取该输入并将其插入到 drupal 的用户表中下面的代码位于我的模块中它仅向表单添加一个字段但是当提交时它不会对数据执行任何操作 function perscri
如何组合两个索引不同的 pandas 系列？

我尝试将两个不同索引的系列组合在一起相同的行数我试过pd concat s1 s2 axis 1 例如 s1 为 index s1 0 1 5 1 2 s2 是 index s2 a 1 b 2 但我得到 index s1 s2 0 1
批量从文本文件中删除重复行

是否可以从文本文件中删除重复的行如果是怎么办当然可以但就像大多数批处理文本文件一样它并不漂亮而且不是特别快该解决方案在查找重复项时忽略大小写并对行进行排序文件名作为第一个也是唯一一个参数传递给批处理脚本 echo off
运行 jQuery 函数 onclick

所以我实现了一些 jQuery 它基本上通过由滑块激活的滑块来切换内容 a 标签现在考虑一下我宁愿让保存链接的 DIV 本身就是链接我正在使用的 jQuery 在我的脑海中看起来像这样 a
用 jQuery 收集表单中的所有项目

如何收集 jQuery 中的所有复选框和下拉列表项进行保存或者对于最新版本的 jquery 您可以使用 http docs jquery com Ajax serialize http docs jquery com Ajax seri
如何解决DEP6500和DEP6701错误？

我有一个项目叫BTLE在它自己的解决方案中加载项目并使用手机上的调试器运行它可以找到我有第二个解决方案可以很好地加载和编译我添加了BTLE项目添加现有项目到第二个解决方案编译它并尝试在调试器中运行它我可以看到应用程序已正确
使用 PIG 从 Hive 表解析嵌套 XML 字符串

我正在尝试使用 PIG 从 Hive 表中的字段而不是从 XML 文件中提取一些 XML 这是我读过的大多数示例的假设 XML 来自排列如下的表 ID XML string XML 字符串包含 n 行始终包含最多 10 个属性中的至少一个
如何在 Python 中写入原始二进制数据？

我有一个 Python 程序可以存储数据并将数据写入文件数据是原始二进制数据内部存储为str 我正在通过 utf 8 编解码器将其写出来但是我得到UnicodeDecodeError charmap codec can t dec
使用 ASP.Net 2.0 创建 SOAP 请求

我正在与服务器网站的技术联系人交谈他希望我使用 Visual Studio 而我只想手写脚本请参阅下文了解我需要生成的 SOAP 请求我已将实际 URL 替换为虚拟 URL 正如您可能猜到的那样我对 ASP 和 SOAP 还很陌生
Wagtail 文档：大文件（>2GB）上传失败

我正在尝试使用 Wagtail 应用程序中内置的 wagtaildocs 应用程序上传文件我已经使用 Nginx 的 Digital Ocean 教程方法设置了 Ubuntu 16 04 服务器鳐鱼 Postgres 一些初步澄清在我

Wagtail 文档：大文件（>2GB）上传失败

Wagtail 文档：大文件（>2GB）上传失败 的相关文章

随机推荐

热门标签

Wagtail 文档：大文件（>2GB）上传失败的相关文章