我从来没能直接解决这个问题,但我确实想出了一个办法来解决它。
我不是 Wagtail 或 Django 专家,所以我确信这个答案有一个正确的解决方案,但无论如何,这就是我所做的。如果您有任何改进建议,请随时发表评论。
作为一个注释,这实际上是提醒我我做了什么的文档。此时(05-25-19)有很多冗余代码行,因为我弗兰肯斯坦将很多代码放在一起。我会加班编辑它。
以下是弗兰肯斯坦中共同创建此解决方案的教程。
- https://www.codingforentrepreneurs.com/blog/large-file-uploads-with-amazon-s3-django/ https://www.codingforentrepreneurs.com/blog/large-file-uploads-with-amazon-s3-django/
- http://docs.wagtail.io/en/v2.1.1/advanced_topics/documents/custom_document_model.html http://docs.wagtail.io/en/v2.1.1/advanced_topics/documents/custom_document_model.html
- https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html
- https://medium.com/faun/summary-667d0fdbcdae https://medium.com/faun/summary-667d0fdbcdae
- http://docs.aws.amazon.com/sdk-for-javascript/v2/developer-guide/loading-browser-credentials-federated-id.html http://docs.aws.amazon.com/sdk-for-javascript/v2/developer-guide/loading-browser-credentials-federated-id.html
- https://kite.com/python/examples/454/threading-wait-for-a-thread-to-finish https://kite.com/python/examples/454/threading-wait-for-a-thread-to-finish
- http://docs.celeryproject.org/en/latest/userguide/daemonizing.html#usage-systemd http://docs.celeryproject.org/en/latest/userguide/daemonizing.html#usage-systemd
可能还有其他一些原则,但这些是原则。
好的,我们开始吧。
我创建一个名为“files”的应用程序,然后使用自定义文档对 models.py 文件进行建模。您需要在设置文件中指定 WAGTAILDOCS_DOCUMENT_MODEL = 'files.LargeDocument'。我这样做的唯一原因是为了更明确地跟踪我正在改变的行为。这个自定义文档模型只是扩展了 Wagtail 中的标准文档模型。
#models.py
from django.db import models
from wagtail.documents.models import AbstractDocument
from wagtail.admin.edit_handlers import FieldPanel
# Create your models here.
class LargeDocument(AbstractDocument):
admin_form_fields = (
'file',
)
panels = [
FieldPanel('file', classname='fn'),
]
接下来,您需要创建一个包含以下内容的 wagtail_hook.py 文件。
#wagtail_hook.py
from wagtail.contrib.modeladmin.options import (
ModelAdmin, modeladmin_register)
from .models import LargeDocument
from .views import LargeDocumentAdminView
class LargeDocumentAdmin(ModelAdmin):
model = LargeDocument
menu_label = 'Large Documents' # ditch this to use verbose_name_plural from model
menu_icon = 'pilcrow' # change as required
menu_order = 200 # will put in 3rd place (000 being 1st, 100 2nd)
add_to_settings_menu = False # or True to add your model to the Settings sub-menu
exclude_from_explorer = False # or True to exclude pages of this type from Wagtail's explorer view
create_template_name ='large_document_index.html'
# Now you just need to register your customised ModelAdmin class with Wagtail
modeladmin_register(LargeDocumentAdmin)
这允许您做两件事:
- 创建一个新的菜单项用于上传大型文档,同时维护标准文档菜单项及其标准功能。
- 指定用于处理大型上传的自定义 html 文件。
这是html
{% extends "wagtailadmin/base.html" %}
{% load staticfiles cache %}
{% load static wagtailuserbar %}
{% load compress %}
{% load underscore_hyphan_to_space %}
{% load url_vars %}
{% load pagination_value %}
{% load static %}
{% load i18n %}
{% block titletag %}{{ view.page_title }}{% endblock %}
{% block content %}
{% include "wagtailadmin/shared/header.html" with title=view.page_title icon=view.header_icon %}
<!-- Google Signin Button -->
<div class="g-signin2" data-onsuccess="onSignIn" data-theme="dark">
</div>
<!-- Select the file to upload -->
<div class="input-group mb-3">
<link rel="stylesheet" href="{% static 'css/input.css'%}"/>
<div class="custom-file">
<input type="file" class="custom-file-input" id="file" name="file">
<label id="file_label" class="custom-file-label" style="width:auto!important;" for="inputGroupFile02" aria-describedby="inputGroupFileAddon02">Choose file</label>
</div>
<div class="input-group-append">
<span class="input-group-text" id="file_submission_button">Upload</span>
</div>
<div id="start_progress"></div>
</div>
<div class="progress-upload">
<div class="progress-upload-bar" role="progressbar" style="width: 100%;" aria-valuenow="100" aria-valuemin="0" aria-valuemax="100"></div>
</div>
{% endblock %}
{% block extra_js %}
{{ block.super }}
{{ form.media.js }}
<script src="https://apis.google.com/js/platform.js" async defer></script>
<script src="https://sdk.amazonaws.com/js/aws-sdk-2.148.0.min.js"></script>
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.2.1/jquery.min.js"></script>
<script src="{% static 'js/awsupload.js' %}"></script>
{% endblock %}
{% block extra_css %}
{{ block.super }}
{{ form.media.css }}
<meta name="google-signin-client_id" content="847336061839-9h651ek1dv7u1i0t4edsk8pd20d0lkf3.apps.googleusercontent.com">
<link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.3.1/css/bootstrap.min.css" integrity="sha384-ggOyR0iXCbMQv3Xipma34MD+dH/1fQ784/j6cY/iJTQUOhcWr7x9JvoRxT2MZw1T" crossorigin="anonymous">
{% endblock %}
然后我在views.py中创建了一些对象
#views.py
from django.shortcuts import render
# Create your views here.
import base64
import hashlib
import hmac
import os
import time
from rest_framework import permissions, status, authentication
from rest_framework.response import Response
from rest_framework.views import APIView
from .config_aws import (
AWS_UPLOAD_BUCKET,
AWS_UPLOAD_REGION,
AWS_UPLOAD_ACCESS_KEY_ID,
AWS_UPLOAD_SECRET_KEY
)
from .models import LargeDocument
import datetime
from wagtail.contrib.modeladmin.views import WMABaseView
from django.db.models.fields.files import FieldFile
from django.core.files import File
import urllib.request
from django.core.mail import send_mail
from .tasks import file_creator
class FilePolicyAPI(APIView):
"""
This view is to get the AWS Upload Policy for our s3 bucket.
What we do here is first create a LargeDocument object instance in our
Django backend. This is to include the LargeDocument instance in the path
we will use within our bucket as you'll see below.
"""
permission_classes = [permissions.IsAuthenticated]
authentication_classes = [authentication.SessionAuthentication]
def post(self, request, *args, **kwargs):
"""
The initial post request includes the filename
and auth credientails. In our case, we'll use
Session Authentication but any auth should work.
"""
filename_req = request.data.get('filename')
if not filename_req:
return Response({"message": "A filename is required"}, status=status.HTTP_400_BAD_REQUEST)
policy_expires = int(time.time()+5000)
user = request.user
username_str = str(request.user.username)
"""
Below we create the Django object. We'll use this
in our upload path to AWS.
Example:
To-be-uploaded file's name: Some Random File.mp4
Eventual Path on S3: <bucket>/username/2312/2312.mp4
"""
doc_obj = LargeDocument.objects.create(uploaded_by_user=user, )
doc_obj_id = doc_obj.id
doc_obj.title=filename_req
upload_start_path = "{location}".format(
location = "LargeDocuments/",
)
file_extension = os.path.splitext(filename_req)
filename_final = "{title}".format(
title= filename_req,
)
"""
Eventual file_upload_path includes the renamed file to the
Django-stored LargeDocument instance ID. Renaming the file is
done to prevent issues with user generated formatted names.
"""
final_upload_path = "{upload_start_path}/{filename_final}".format(
upload_start_path=upload_start_path,
filename_final=filename_final,
)
if filename_req and file_extension:
"""
Save the eventual path to the Django-stored LargeDocument instance
"""
policy_document_context = {
"expire": policy_expires,
"bucket_name": AWS_UPLOAD_BUCKET,
"key_name": "",
"acl_name": "public-read",
"content_name": "",
"content_length": 524288000,
"upload_start_path": upload_start_path,
}
policy_document = """
{"expiration": "2020-01-01T00:00:00Z",
"conditions": [
{"bucket": "%(bucket_name)s"},
["starts-with", "$key", "%(upload_start_path)s"],
{"acl": "public-read"},
["starts-with", "$Content-Type", "%(content_name)s"],
["starts-with", "$filename", ""],
["content-length-range", 0, %(content_length)d]
]
}
""" % policy_document_context
aws_secret = str.encode(AWS_UPLOAD_SECRET_KEY)
policy_document_str_encoded = str.encode(policy_document.replace(" ", ""))
url = 'https://thearchmedia.s3.amazonaws.com/'
policy = base64.b64encode(policy_document_str_encoded)
signature = base64.b64encode(hmac.new(aws_secret, policy, hashlib.sha1).digest())
doc_obj.file_hash = signature
doc_obj.path = final_upload_path
doc_obj.save()
data = {
"policy": policy,
"signature": signature,
"key": AWS_UPLOAD_ACCESS_KEY_ID,
"file_bucket_path": upload_start_path,
"file_id": doc_obj_id,
"filename": filename_final,
"url": url,
"username": username_str,
}
return Response(data, status=status.HTTP_200_OK)
class FileUploadCompleteHandler(APIView):
permission_classes = [permissions.IsAuthenticated]
authentication_classes = [authentication.SessionAuthentication]
def post(self, request, *args, **kwargs):
file_id = request.POST.get('file')
size = request.POST.get('fileSize')
data = {}
type_ = request.POST.get('fileType')
if file_id:
obj = LargeDocument.objects.get(id=int(file_id))
obj.size = int(size)
obj.uploaded = True
obj.type = type_
obj.file_hash
obj.save()
data['id'] = obj.id
data['saved'] = True
data['url']=obj.url
return Response(data, status=status.HTTP_200_OK)
class ModelFileCompletion(APIView):
permission_classes = [permissions.IsAuthenticated]
authentication_classes = [authentication.SessionAuthentication]
def post(self, request, *args, **kwargs):
file_id = request.POST.get('file')
url = request.POST.get('aws_url')
data = {}
if file_id:
obj = LargeDocument.objects.get(id=int(file_id))
file_creator.delay(obj.pk)
data['test'] = 'process started'
return Response(data, status=status.HTTP_200_OK)
def LargeDocumentAdminView(request):
context = super(WMABaseView, self).get_context(request)
render(request, 'modeladmin/files/index.html', context)
该视图围绕标准文件处理系统。我不想放弃标准文件处理系统或编写一个新的系统。这就是为什么我称此黑客为非理想解决方案的原因。
// javascript upload file "awsupload.js"
var id_token; //token we get upon Authentication with Web Identiy Provider
function onSignIn(googleUser) {
var profile = googleUser.getBasicProfile();
// The ID token you need to pass to your backend:
id_token = googleUser.getAuthResponse().id_token;
}
$(document).ready(function(){
// setup session cookie data. This is Django-related
function getCookie(name) {
var cookieValue = null;
if (document.cookie && document.cookie !== '') {
var cookies = document.cookie.split(';');
for (var i = 0; i < cookies.length; i++) {
var cookie = jQuery.trim(cookies[i]);
// Does this cookie string begin with the name we want?
if (cookie.substring(0, name.length + 1) === (name + '=')) {
cookieValue = decodeURIComponent(cookie.substring(name.length + 1));
break;
}
}
}
return cookieValue;
}
var csrftoken = getCookie('csrftoken');
function csrfSafeMethod(method) {
// these HTTP methods do not require CSRF protection
return (/^(GET|HEAD|OPTIONS|TRACE)$/.test(method));
}
$.ajaxSetup({
beforeSend: function(xhr, settings) {
if (!csrfSafeMethod(settings.type) && !this.crossDomain) {
xhr.setRequestHeader("X-CSRFToken", csrftoken);
}
}
});
// end session cookie data setup.
// declare an empty array for potential uploaded files
var fileItemList = []
$(document).on('click','#file_submission_button', function(event){
var selectedFiles = $('#file').prop('files');
formItem = $(this).parent()
$.each(selectedFiles, function(index, item){
uploadFile(item)
})
$(this).val('');
$('.progress-upload-bar').attr('aria-valuenow',progress);
$('.progress-upload-bar').attr('width',progress.toString()+'%');
$('.progress-upload-bar').attr('style',"width:"+progress.toString()+'%');
$('.progress-upload-bar').text(progress.toString()+'%');
})
$(document).on('change','#file', function(event){
var selectedFiles = $('#file').prop('files');
$('#file_label').text(selectedFiles[0].name)
})
function constructFormPolicyData(policyData, fileItem) {
var contentType = fileItem.type != '' ? fileItem.type : 'application/octet-stream'
var url = policyData.url
var filename = policyData.filename
var repsonseUser = policyData.user
// var keyPath = 'www/' + repsonseUser + '/' + filename
var keyPath = policyData.file_bucket_path
var fd = new FormData()
fd.append('key', keyPath + filename);
fd.append('acl','private');
fd.append('Content-Type', contentType);
fd.append("AWSAccessKeyId", policyData.key)
fd.append('Policy', policyData.policy);
fd.append('filename', filename);
fd.append('Signature', policyData.signature);
fd.append('file', fileItem);
return fd
}
function fileUploadComplete(fileItem, policyData){
data = {
uploaded: true,
fileSize: fileItem.size,
file: policyData.file_id,
}
$.ajax({
method:"POST",
data: data,
url: "/api/files/complete/",
success: function(data){
displayItems(fileItemList)
},
error: function(jqXHR, textStatus, errorThrown){
alert("An error occured, please refresh the page.")
}
})
}
function modelComplete(policyData, aws_url){
data = {
file: policyData.file_id,
aws_url: aws_url
}
$.ajax({
method:"POST",
data: data,
url: "/api/files/modelcomplete/",
success:
console.log('model complete success') ,
error: function(jqXHR, textStatus, errorThrown){
alert("An error occured, please refresh the page.")
}
})
}
function displayItems(fileItemList){
var itemList = $('.item-loading-queue')
itemList.html("")
$.each(fileItemList, function(index, obj){
var item = obj.file
var id_ = obj.id
var order_ = obj.order
var html_ = "<div class=\"progress\">" +
"<div class=\"progress-bar\" role=\"progressbar\" style='width:" + item.progress + "%' aria-valuenow='" + item.progress + "' aria-valuemin=\"0\" aria-valuemax=\"100\"></div></div>"
itemList.append("<div>" + order_ + ") " + item.name + "<a href='#' class='srvup-item-upload float-right' data-id='" + id_ + ")'>X</a> <br/>" + html_ + "</div><hr/>")
})
}
function uploadFile(fileItem){
var policyData;
var newLoadingItem;
// get AWS upload policy for each file uploaded through the POST method
// Remember we're creating an instance in the backend so using POST is
// needed.
$.ajax({
method:"POST",
data: {
filename: fileItem.name
},
url: "/api/files/policy/",
success: function(data){
policyData = data
},
error: function(data){
alert("An error occured, please try again later")
}
}).done(function(){
// construct the needed data using the policy for AWS
var file = fileItem;
AWS.config.credentials = new AWS.WebIdentityCredentials({
RoleArn: 'arn:aws:iam::120974195102:role/thearchmedia-google-role',
ProviderId: null, // this is null for Google
WebIdentityToken: id_token // Access token from identity provider
});
var bucket = 'thearchmedia'
var key = 'LargeDocuments/'+file.name
var aws_url = 'https://'+bucket+'.s3.amazonaws.com/'+ key
var s3bucket = new AWS.S3({params: {Bucket: bucket}});
var params = {Key: key , ContentType: file.type, Body: file, ACL:'public-read', };
s3bucket.upload(params, function (err, data) {
$('#results').html(err ? 'ERROR!' : 'UPLOADED :' + data.Location);
}).on(
'httpUploadProgress', function(evt) {
progress = parseInt((evt.loaded * 100) / evt.total)
$('.progress-upload-bar').attr('aria-valuenow',progress)
$('.progress-upload-bar').attr('width',progress.toString()+'%')
$('.progress-upload-bar').attr('style',"width:"+progress.toString()+'%')
$('.progress-upload-bar').text(progress.toString()+'%')
}).send(
function(err, data) {
alert("File uploaded successfully.")
fileUploadComplete(fileItem, policyData)
modelComplete(policyData, aws_url)
});
})
}
})
.js 和 .view.py 交互说明
首先,头部带有文件信息的 Ajax 调用会创建 Document 对象,但由于文件从不接触服务器,因此不会在 Document 对象中创建“File”对象。这个“文件”对象包含了我需要的功能,所以我需要做更多的事情。接下来,我的 javascript 文件使用 AWS Javascript SDK 将文件上传到我的 s3 存储桶。 SDK 中的 s3bucket.upload() 函数足够强大,可以上传高达 5GB 的文件,但如果不包括一些其他修改,它可以上传高达 5TB(aws 限制)。文件上传到 s3 存储桶后,将进行最终的 API 调用。最终的 API 调用会触发 Celery 任务,将文件下载到远程服务器上的临时目录。一旦文件存在于我的远程服务器上,就会创建文件对象并将其保存到文档模型中。
task.py 文件处理将文件从 S3 存储桶下载到远程服务器,然后创建 File 对象并将其保存到文档文件。
#task.py
from .models import LargeDocument
from celery import shared_task
import urllib.request
from django.core.mail import send_mail
from django.core.files import File
import threading
@shared_task
def file_creator(pk_num):
obj = LargeDocument.objects.get(pk=pk_num)
tmp_loc = 'tmp/'+ obj.title
def downloadit():
urllib.request.urlretrieve('https://thearchmedia.s3.amazonaws.com/LargeDocuments/' + obj.title, tmp_loc)
def after_dwn():
dwn_thread.join() #waits till thread1 has completed executing
#next chunk of code after download, goes here
send_mail(
obj.title + ' has finished to downloading to the server',
obj.title + 'Downloaded to server',
'[email protected] /cdn-cgi/l/email-protection',
['[email protected] /cdn-cgi/l/email-protection'],
fail_silently=False,
)
reopen = open(tmp_loc, 'rb')
django_file = File(reopen)
obj.file = django_file
obj.save()
send_mail(
obj.title + ' has finished to downloading to the server',
'File Model Created for' + obj.title,
'[email protected] /cdn-cgi/l/email-protection',
['[email protected] /cdn-cgi/l/email-protection'],
fail_silently=False,
)
dwn_thread = threading.Thread(target=downloadit)
dwn_thread.start()
metadata_thread = threading.Thread(target=after_dwn)
metadata_thread.start()
这个过程需要在 Celery 中运行,因为下载大文件需要时间,而且我不想在浏览器打开的情况下等待。此task.py 内部还有一个python thread(),它强制进程等待,直到文件成功下载到远程服务器。如果您是 Celery 新手,这里是他们文档的开始(http://docs.celeryproject.org/en/master/getting-started/introduction.html http://docs.celeryproject.org/en/master/getting-started/introduction.html)
此外,我还添加了一些电子邮件通知,以确认流程已完成。
最后一点是,我在项目中创建了一个 /tmp 目录,并设置了每日删除所有文件的操作,以赋予其 tmp 功能。
crontab -e
find ~/thearchmedia/tmp -mtime +1 -delete