使用Visual Genome API + python3使用及数据集详情

2023-05-16

Visual Genome数据集

Visual Genome 主页
Visual Genome API
Visual Genome Python Driver
Visual Genome 论文
注意，API多为python2的实现，这里在使用python3.8时做了个别源码的修改,请注意注释，有问题可以留言

安装 API

pip install visual-genome

代码

注意，以下注释中有2处含“代码问题”字样，需要手动修改安装的API的源码。

'''
使用visual_genome api获取数据集 版本1.1.1
参考https://github.com/ranjaykrishna/visual_genome_python_driver
参考2 https://visualgenome.org/api/v0/api_object_model.html
安装pip install visual-genome
注意，默认为pythn2版本的，而这里我们采用python3版本的，并对源码做了部分修改
'''
from visual_genome import api
import matplotlib.pyplot as plt
import requests
from PIL import Image
from io import BytesIO
from matplotlib.patches import Rectangle

# get the list of all image ids in the Visual Genome dataset
ids = api.get_all_image_ids()
print(ids[0])
# >> 1

# There are 108249 images currently, if we want to just get the ids of images 2000 to 2010
#代码问题，此处python2和python3的差距，手动修改api.py中27 28行，即加入int
id = api.get_image_ids_in_range(start_index=2000,end_index=2010)
print(id)
# >>> [2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011]

# Get image data, include url, width, height, COCO and Flickr ids
image = api.get_image_data(id=61512)
print(image)
# >>> id: 61512, coco_id: 248774, flickr_id: 6273011878, width: 1024,
url: https://cs.stanford.edu/people/rak248/VG_100K/61512.jpg

# Get region descriptions for an image -- this is dense captions of an image
# Each region description is a textual description of a particular region in the image
# bbox format : top left width height
regions = api.get_region_descriptions_of_image(id=61512)
print(regions[0])
# >>> id: 1, x: 511, y: 241, width: 206,height: 320, phrase: A brown, sleek horse with a bridle, image: 61512

# Get Region Graph from Region
# Region Graphs are tiny scene graphs for a particular region of an image,
# containing objects, attributes and relationships.
# We will get the scene graph of an image and print out the objects, attributes and relationships
graph = api.get_region_graph_of_region(image_id=61512,region_id=1)
# Remember that region description is 'A brown, sleek horse with a bridle'
print(graph.objects)
# >>> [horse]
print(graph.attributes)
# >>> [3015675: horse is brown]
print(graph.relationships)
# >>> []
# The region graph has one object: horse and one attribute: brown to describe the horse. no relationships

# Get Scene Graph for an image
# Each scene graph has three components: objects, attributes, and relationships.
graph = api.get_scene_graph_of_image(id=61512)
# print the object, only the name and not the bbox
print(graph.objects)
# >>>  [horse, grass, horse, bridle, truck, sign, gate, truck, tire, trough, window, door, building, halter,
#        mane, mane,leaves, fence]
# print the attributes
print(graph.attributes)
# >>> [3015675: horse is brown, 3015676: horse is spotted, 3015677: horse is red, 3015678: horse is dark brown,
#       3015679: truck is red, 3015680: horse is brown, 3015681: truck is red, 3015682: sign is blue,
#       3015683: gate is red, 3015684: truck is white, 3015685: tire is blue, 3015686: gate is wooden,
#       3015687: horse is standing, 3015688: truck is red, 3420018: horse is brown, 3420019: horse is white,
#       3015690: building is tan, 3015691: halter is red, 3015692: horse is brown, 3015693: gate is wooden,
#       3015694: grass is grassy, 3015695: truck is red, 3015696: gate is orange, 3015697: halter is red,
#       3015698: tire is blue, 3015699: truck is white, 3015700: trough is white, 3420016: horse is brown,
#       3420017: horse is cream, 3015702: leaves is green, 3015703: grass is lush, 3015704: horse is enclosed,
#       3420022: horse is brown, 3420023: horse is white, 3015706: horse is chestnut, 3015707: gate is red,
#       3015708: leaves is green, 3015709: building is brick, 3015710: truck is large, 3015711: gate is red,
#       3015712: horse is chestnut colored, 3015713: fence is wooden]
# print the relationships
print(graph.relationships)
# >>> [3199950: horse stands on top of grass, 3199951: horse IN grass, 3199952: horse WEARING bridle,
#      3199953: trough for horse, 3199954: window next to door, 3199955: building has door,
#      3199956: horse nudging horse, 3199957: horse has mane, 3199958: horse has mane, 3199959: trough for horse]

# Get Question Answers for an image
# Each Question Answer object contains the id of the question-answer pair, the id of image,
#    the question and the answer string, as well as the list of question objects and answer
#    objects identified and canonicalized in the qa pair.
# 代码问题 手动修改visual_genome/utils.py为
#   qas.append(QA(info['id'], image_map[info['image']],
qas = api.get_QA_of_image(id=61512)
# First print out some core information of the QA
print(qas[1])
# >>> id: 991155, image: 61512, question: What is the window treatment?, answer: White blinds.
# Now let's print out the question objects of the QA
print(qas[1].q_objects)
# >>> []

# Get all Questions Answers in the dataset
# We can get all 1.7 million QAs in the Visual Genome dataset, if we don't want to get all the data,
#    we can also specify how many QAs we want the function to return using the parameter qtotal
qas = api.get_all_QAs(qtotal=10)
print(qas[0])
# >>> id: 991155, image: 61512, question: What is the window treatment?, answer: White blinds.

# Get one type of Questions Answers from the entire dataset
# We can choose one type of <what, who, why, when, how>
qas = api.get_QA_of_type(qtotal=10, qtype='why')
print(qas[0])
# >>> id: 133089, image: 1159910, question: Why is the man cosplaying?, answer: For an event.

# Visualizing some regions， refer to https://visualgenome.org/api/v0/api_beginners_tutorial.html
image = api.get_image_data(id=61512)
regions = api.get_region_descriptions_of_image(id=61512)
fig = plt.gcf()
fig.set_size_inches(18.5, 10.5)
def visualize_regions(image, regions):
    response = requests.get(image.url)
    img = Image.open(BytesIO(response.content))
    plt.imshow(img)
    ax = plt.gca()
    for region in regions:
        ax.add_patch(Rectangle((region.x, region.y),
                               region.width,
                               region.height,
                               fill=False,
                               edgecolor='red',
                               linewidth=3))
        ax.text(region.x, region.y, region.phrase, style='italic', bbox={'facecolor':'white', 'alpha':0.7, 'pad':10})
    fig = plt.gcf()
    plt.tick_params(labelbottom='off', labelleft='off')
    plt.show()
#visualize_regions(image, regions[:8])
#visualize_regions(image, regions) # plot all

可视化region图

在这里插入图片描述

完整数据集格式

如果直接下载完整数据集到本地，并读取json文件分析标注格式，数据集格式可以汇总如下
在这里插入图片描述

论文精炼

读完40多页的论文，提炼出主要信息如下
在这里插入图片描述

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)

Visual

Genome

API

python3

使用及数据集详情

使用Visual Genome API + python3使用及数据集详情的相关文章

等待多个 future 的回调

最近我深入研究了一些使用 API 的工作该API使用Unirest http库来简化从网络接收的工作当然由于数据是从 API 服务器调用的因此我尝试通过使用对 API 的异步调用来提高效率我的想法结构如下通过返回 future
API向后兼容性的最佳实践

我正在开发一个与 JSON API 进行通信的 iPhone iPad Android 应用程序该应用程序版本的第一个版本已经完成现在正在进行其他开发阶段在其他阶段应用程序需要与新版本的 API 集成并允许用户访问其他功能例如新
如何获取 gmail api 的消费者密钥和消费者秘密？

我正在尝试使用 Gmail php xoath php 示例但是它需要输入消费者密钥和消费者秘密我在 Gmail api 文档中找不到如何获取这些密钥和秘密有谁知道如何获取它们或知道任何相关文档吗 Use anonymous anon
Azure API 管理和 ASMX/WSDL SOAP 端点？

我有一个使用 ASMX 终结点的旧 SOAP API Azure API 管理能够识别它并与之交互吗还有更多人对此功能感兴趣反馈 azure com http feedback azure com forums 248703 api m
TypeError：expect(...).to.startsWith 不是一个函数 - chai 和 chakram

我开始编写一些自动化测试 API 现在我尝试对此端点执行以下操作 https dog ceo api breeds image random https dog ceo api breeds image random 所以我添加到我的函数中
Slim 框架总是返回 404 错误

这些天我正在使用纤薄的框架作为我开发 php web api 的最简单的工具使用这两篇文章科恩拉茨 http coenraets org blog 2011 12 restful services with jquery php an
在 youtube api 中检测播放事件

我正在寻找一种通过 Javascript 检测嵌入 Youtube 视频中的播放事件的方法现在我能够检测到状态更改但我不知道如何在之后解除事件绑定并触发另一个事件来表明它已完成我也不想使用 add removeEventListene
Aurelia Post 使用 http-fetch-client 生成选项请求

我正在创建一个小型论坛我们公司的人员可以使用 aurelia 为他们想要即时销售的商品或服务发布广告我有一个广告页面列表工作正常每个广告的详细信息页面都工作正常都使用来自 api 的 get 请求然而当有人想在广告上添加评论时
新的 Basecamp api 告诉我该地址没有 Basecamp 帐户

我是 Basecamp api 的新手在尝试最简单的示例时 curl u user pass H User Agent MyApp email protected cdn cgi l email protection https base
无法为 api 路由 laravel 设置 cookie

使用后端 laravel 和前端 SPA vue js vue cli 3 进行服务我需要通过 httpOnly cookie 不是 localStorage 进行身份验证我用tymondesigns jwt auth https gi
Shutterfly 订单 API 。

我找到了这个网站 http www shutterfly com documentation api OrderImage sfly http www shutterfly com documentation api OrderImage
如何使用 Whatsapp Cloud API 发送短信

我在使用 Whatsapp Cloud API 已于 5 月 22 日向公众发布时遇到问题我做了一切在入门 https developers facebook com docs whatsapp cloud api get starte
Ionic 3 Uncaught（承诺）：[object Object]

我是 Ionic 3 和移动开发的新手我正在尝试将 MySQL DB 连接到我的 Ionic 应用程序和 PHP Restful API 我用 Postman 测试了 API 它工作得很好为了在 Ionic 中实现它我做了以下操作我
使用 LinkedIn REST API 更新个人资料

是否可以通过 LinkedIn API 更新个人资料的教育专业和或经验我可以正常进行正常的 GET 调用我在这里问是因为他们网站上的文档没有产生任何结果而 Stackoverflow 会有更多的实践经验编辑进一步的搜索使我发现
获取发送 cURL 请求的用户的 IP 地址

我想获取使用 php 中的 cURL POST 方法向我的服务器发送请求的用户的 IP 地址我正在开发一个 Flight API 我将使用 cURL POST 方法获取请求我必须获取客户端的 IP 地址并验证他的 IP 地址是否可用如
如何将 YouTube API 集成到我的 iPhone 应用程序中？

我想将 YouTube API 集成到我的应用程序中我该怎么做附注我正在为 YouTube 频道制作一个应用程序我尝试以webview 但这让一切变得更糟因为用户可以看到 YouTube 控件搜索等以及有关 YouTube i
获取 Youtube 上的游戏直播列表

我正在尝试使用 Youtube 数据 API 来获取当前与游戏相关的直播流列表但我找不到任何符合我需要的端点并返回每个频道的观看者数量你们知道我该如何做到这一点吗 Thanks 游戏直播列表 videoCategoryId 20 是游
ReSharper API...呃...它在哪里？

好吧我一定正在享受金发时刻但我一生都找不到去哪里下载 ReSharper API 与我获得的项目一起使用here http devlicio us blogs hadi hariri archive 2010 01 12 writin
我们可以使用 axios 的 onDownloadProgress 来加载 API 吗？

我需要使用 axios 创建一个用于在 React 项目中加载 API 的进度条我为此发现了 onDownloadProgress 函数但我不知道我们是否可以使用它来获取诸如加载百分比之类的信息或者它是否仅用于文件下载所以我不确定我
通过 Office API 将多个 Word 文档保存为 HTML

我有大量的Word文档需要解析由于它们都是从同一个模板创建的我认为最好的方法是将它们保存为 HTML 文件并解析 HTML 本身虽然将单个 Word 文档保存为 HTML 相当容易但我还没有找到从 Word 内部执行批量过程的方法

随机推荐

Linux系统软件包管理——dpkg、apt-get、rpm、yum

软件包管理是一种在系统上安装维护软件的方法主要有两种方式 xff0c 一种是通过安装Linux经销商发布的软件包来满足软件需求 xff1b 一种是先下载源代码 xff0c 然后对其进行编译 xff08 博主在使用jetson tx2时
Linux之网络相关命令——ping、tranceroute、netstat、ftp、lftp、wget、ssh、scp、sftp

网络连接方面 xff0c Linux可以说是万能的 Linux工具可以建立各种网络系统及应用 xff0c 包括防火墙路由器域名服务器 NAS xff08 网络附加存储 xff09 盒等这里主要讲一些经常用到的命令 xff0c 涉及网络
Linux文件搜索命令介绍——locate、find、xargs、touch、stat

本文主要介绍两个用在Linux系统中搜索文件的工具 locate 通过文件名查找文件find 在文件系统目录框架中查找文件同时 xff0c 我们也会介绍一个通常与文件搜索命令一起使用处理搜索结果文件列表的命令 xargs 从标准输入中建
ubuntu使用bash脚本+gnome实现开机自启python程序和崩溃重启

这里以tx2的ubuntu18 04为例 xff0c 对ubuntu系统是有效的例如我们要实现开机自动启动 home me test main py程序 xff0c 并且当main py出现任何意料之外的错误报错时 xff0c 系统可以重
http请求转串口通信系统开发者文档

http请求转串口通信系统介绍系统价值和功能与口号让所有单片机联网通信 1 系统使用c语言mqtt协议开发esp8266为硬件载体 xff0c 调用者只需要任意编程语言的串口通信即可 xff01 2 是一个好用的免费的稳定的单片机网络通
ubuntu实现屏幕的旋转和开启自动旋转屏幕

1 旋转屏幕有两种方法 xff0c 一种是命令行 xff0c 一种是图形界面这里只介绍命令行 xff0c 因为其简单 xrandr o left 向左旋转90度 xff0c 用于横屏转竖屏 xrandr o right 向右旋转90度
MaskRCNN在Jetson tx2上的测速结果

博主测试了在不同模式精度下降MaskRCNN部署到Jetson TX2上的测速结果 xff0c 与大家分享讨论对FasterRCNN的测速可见FasterRcnn在Jetson TX2上测速使用的MaskRCNN框架 matterpo
FasterRcnn在Jetson TX2上测速

博主测试了在不同模式精度下将FasterRCNN部署到Jetson TX2上的测速结果 xff0c 与大家分享讨论对于MaskRCNN的部署结果可参见 MaskRCNN在Jetson tx2上的测速结果使用的Caffe版本Faster
Linux学习笔记导航页

本博客中与博主Linux学习相关的博文导航 xff0c 方便查看 Linux系统ls命令详解Linux系统中目录的内容详解 bin dev etc home lib opt usr varLinux操作文件与目录 cp mv mkdir r
Jetson TX2使用经验导航页

本博客中与Jetson TX2使用相关的博文导航 xff0c 方便查看 JetsonTX2 之刷机 Jetpack 4 3TX2 ubuntu 18 04 更换清华镜像源Jetson TX2刷机后查看CUDA和CUDNN版本以JetPac
Pytorch学习导航页

本博客中与pytorch学习相关的博文 xff0c 方便查看 Pytorch源码学习之一 xff1a torchvision models alexnetPytorch源码学习之二 xff1a torchvision models vggP
Python小技巧导航页

本博客中与Python使用技巧相关的博文 xff0c 方便查看使用matplotlib绘图库的pyplot快速绘图Python调用face 43 43 API完成本地图片的人脸检测Python爬虫按照关键词爬取视觉中国高清图像pytho
Linux归档与备份——gzip、gunzip、bzip2、bunzip2、tar、zip、unzip、rsync

维护系统数据安全是计算机系统管理者的基本任务之一 xff0c 及时创建系统文件的备份文件是维度系统数据安全的一种常用方法本节主要介绍以下命令文件压缩程序 gzip 压缩和解压缩文件工具bzip2 块排序文件压缩工具文件归档程序 tar
Linux之存储介质——mount、umount、fdisk、mkfs

本节讨论设备级别的数据处理对于诸如硬盘之类的物理存储器网络存储器以及像RAID 独立冗余磁盘陈列和LVM 逻辑卷管理之类的虚拟存储器 xff0c Linux都有惊人的处理能力本节主要用到以下命令 mount 挂载文件系统umoun
Jetson TX2挂载SD卡--亲测有效！

不得不说 xff0c TX2用于深度学习算法的部署 xff0c 一个很大的问题是硬盘容量太小 xff0c 由于我的应用需求需要存储大量数据 xff0c 因此需要挂载一个SD卡关于Linux挂载存储介质相关原理可参考我的博客 Linux之存
实用的测试流程梳理总结（质量保障）

废话不多说 xff0c 简明扼要的列出我认为测试最重要的几点 xff1a 1 测试思维 xff1a 优秀的测试思维对case设计的好坏起决定作用 xff0c case的好坏对测试效率和测试质量起决定作用 xff0c 所以测试思维非常重要我
Linux之正则表达式---grep、元字符、任意字符、锚、中括号、否定、POSIX字符类

正则表达式是一个非常重要的用于文本操作的工具 0 参考文献 Linux命令行大全美 William E Shotts Jr 著郭光伟郝记生译 xff0c 人民邮电出版社更多有用的Linux知识详解 xff0c 可参加博主的Linu
Linux之文本处理---cat、sort、uniq、cut、paste、join、comm、diff、patch、tr、sed、aspell

由于所有类UNIX操作系统都严重依赖于文本文件来进行某些数据类型的存储所以需要很多可以进行文本操作的工具常见的文本格式有文件 xff1a 使用纯文本格式编辑的文件在使用文本格式编辑较大文件时 xff0c 常用的方法是 xff0c 首
Linux之编译程序详细介绍---./configure、make、make install

本节介绍如何通过源代码生成可执行程序 xff0c 在博主前期使用NVIDIA Jetson TX2时由于Arm架构的各个包不完备经常需要源码编译OpenCV等为什么要编译软件呢 xff1f 可用性尽管有些发行版已经包含了版本库中的一
使用Visual Genome API + python3使用及数据集详情

Visual Genome数据集 Visual Genome 主页Visual Genome APIVisual Genome Python DriverVisual Genome 论文注意 xff0c API多为python2的实现 x