github地址
GitHub - openai/CLIP: Contrastive Language-Image Pretraining
创建python环境
conda create -n CLIP python=3.8
安装pytorch和torchvision
conda install --yes -c pytorch pytorch=1.7.1 torchvision cudatoolkit=11.0
下载包:ftfy regex tqdm 和CLIP
pip install ftfy regex tqdm
pip install git+https://github.com/openai/CLIP.git
应用举例
import torch
import clip
from PIL import Image
device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load("ViT-B/32", device=device)
image = preprocess(Image.open("clip.jpg")).unsqueeze(0).to(device)
text = clip.tokenize(["two dogs", "this is a dog", "two dogs on grass", "there are two dogs"]).to(device)
with torch.no_grad():
image_features = model.encode_image(image)
text_features = model.encode_text(text)
logits_per_image, logits_per_text = model(image, text)
probs = logits_per_image.softmax(dim=-1).cpu().numpy()
print("Label probs:", probs)
比如输入一张狗的照片
输出结果:
Label probs: [[0.2998 0.102 0.4163 0.1819]]
即这张图片属于每个描述的概率