如何使用 R 中的加权(调查)数据制作漂亮的无边界地理专题/热图,可能对点观测使用空间平滑

2024-01-09

自从约书亚·卡茨发表这些方言地图 http://spark.rstudio.com/jkatz/SurveyMaps/你可以找到 using 哈佛大学方言调查 http://www4.uwm.edu/FLL/linguistics/dialect/maps.html,我一直在尝试复制和推广他的方法..但其中大部分都超出了我的能力范围。乔什透露了他的一些方法在这张海报中 http://www4.ncsu.edu/~jakatz2/files/dialectposter.png,但(据我所知)尚未公开他的任何代码。

我的目标是推广这些方法,以便任何主要美国政府调查数据集的用户都可以轻松地将其加权数据放入函数中并获得合理的地理地图。地理位置各不相同:一些调查数据集有 ZCTA,一些有县,一些有州,一些有都会区等。从在质心处绘制每个点开始可能是明智的 - 讨论了质心here http://www.census.gov/geo/reference/zctafaq.html并且适用于大多数地区人口普查局 2010 年地名词典档案 http://www.census.gov/geo/maps-data/data/gazetteer2010.html。因此,对于每个调查数据点,您在地图上都有一个点。但有些调查回复的权重为 10,其他调查回复的权重为 100,000!显然,无论最终出现在地图上的“热量”或平滑或着色都需要考虑不同的权重。

我擅长调查数据,但我对空间平滑或核估计一无所知。乔什在他的海报中使用的方法是k-nearest neighbor kernel smoothing with gaussian kernel这对我来说很陌生。我是绘图方面的新手,但如果我知道目标应该是什么,我通常可以让事情顺利进行。

注意:这个问题非常类似于十个月前提出的问题不再包含可用数据 https://stackoverflow.com/questions/17025273/create-heatmap-with-distribution-of-attribute-values-in-r-not-density-heatmap。还有一些花絮信息在这个线程上 https://stackoverflow.com/questions/14461954/heatmap-based-on-average-weights-and-not-on-the-number-of-data-points,但如果有人有一个聪明的方法来回答我的确切问题,我显然更愿意看到这一点。

r 调查包有一个svyplot函数,如果运行这些代码行,您可以看到笛卡尔坐标上的加权数据。但实际上,对于我想做的事情,绘图需要覆盖在地图上。

library(survey)
data(api)
dstrat<-svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc)
svyplot(api00~api99, design=dstrat, style="bubble")

如果有任何用处,我已经发布了一些示例代码,这些代码将为任何愿意帮助我的人提供一种快速方法,以在基于核心的统计区域(另一种地理类型)开始使用一些调查数据。

任何想法、建议、指导将不胜感激(如果我能得到正式的教程/指南/操作方法,我将不胜感激)http://asdfree.com/ http://asdfree.com/)

谢谢!!!!!!!!!!

# load a few mapping libraries
library(rgdal)
library(maptools)
library(PBSmapping)


# specify some population data to download
mydata <- "http://www.census.gov/popest/data/metro/totals/2012/tables/CBSA-EST2012-01.csv"

# load mydata
x <- read.csv( mydata , skip = 9 , h = F )

# keep only the GEOID and the 2010 population estimate
x <- x[ , c( 'V1' , 'V6' ) ]

# name the GEOID column to match the CBSA shapefile
# and name the weight column the weight column!
names( x ) <- c( 'GEOID10' , "weight" )

# throw out the bottom few rows
x <- x[ 1:950 , ]

# convert the weight column to numeric
x$weight <- as.numeric( gsub( ',' , '' , as.character( x$weight ) ) )

# now just make some fake trinary data
x$trinary <- c( rep( 0:2 , 316 ) , 0:1 )

# simple tabulation
table( x$trinary )

# so now the `x` data file looks like this:
head( x )

# and say we just wanted to map
# something easy like
# 0=red, 1=green, 2=blue,
# weighted simply by the population of the cbsa

# # # end of data read-in # # #


# # # shapefile read-in? # # #

# specify the tiger file to download
tiger <- "ftp://ftp2.census.gov/geo/tiger/TIGER2010/CBSA/2010/tl_2010_us_cbsa10.zip"

# create a temporary file and a temporary directory
tf <- tempfile() ; td <- tempdir()

# download the tiger file to the local disk
download.file( tiger , tf , mode = 'wb' )

# unzip the tiger file into the temporary directory
z <- unzip( tf , exdir = td )

# isolate the file that ends with ".shp"
shapefile <- z[ grep( 'shp$' , z ) ]

# read the shapefile into working memory
cbsa.map <- readShapeSpatial( shapefile )

# remove CBSAs ending with alaska, hawaii, and puerto rico
cbsa.map <- cbsa.map[ !grepl( "AK$|HI$|PR$" , cbsa.map$NAME10 ) , ]

# cbsa.map$NAME10 now has a length of 933
length( cbsa.map$NAME10 )

# convert the cbsa.map shapefile into polygons..
cbsa.ps <- SpatialPolygons2PolySet( cbsa.map )

# but for some reason, cbsa.ps has 966 shapes??
nrow( unique( cbsa.ps[ , 1:2 ] ) )
# that seems wrong, but i'm not sure how to fix it?

# calculate the centroids of each CBSA
cbsa.centroids <- calcCentroid(cbsa.ps)
# (ignoring the fact that i'm doing something else wrong..because there's 966 shapes for 933 CBSAs?)

# # # # # # as far as i can get w/ mapping # # # #


# so now you've got
# the weighted data file `x` with the `GEOID10` field
# the shapefile with the matching `GEOID10` field
# the centroids of each location on the map


# can this be mapped nicely?

我不确定我可以在空间平滑方面提供多少帮助,因为这是一项我几乎没有经验的任务,但我花了一些时间在 R 中制作地图,所以我希望我下面添加的内容将对这部分有所帮助你的问题。

我已经开始编辑你的代码# # # shapefile read-in # # #;你会注意到我把地图放在SpatialPolygonsDataFrame类和我依靠raster and gstat用于构建网格并运行空间平滑的软件包。空间平滑模型是我最不熟悉的部分,但这个过程使我能够制作栅格并演示如何对其进行遮罩、投影和绘制。

library(rgdal)
library(raster)
library(gstat)

# read in a base map
m <- getData("GADM", country="United States", level=1)
m <- m[!m$NAME_1 %in% c("Alaska","Hawaii"),]

# specify the tiger file to download
tiger <- "ftp://ftp2.census.gov/geo/tiger/TIGER2010/CBSA/2010/tl_2010_us_cbsa10.zip"

# create a temporary file and a temporary directory
tf <- tempfile() ; td <- tempdir()

# download the tiger file to the local disk
download.file( tiger , tf , mode = 'wb' )

# unzip the tiger file into the temporary directory
z <- unzip( tf , exdir = td )

# isolate the file that ends with ".shp"
shapefile <- z[ grep( 'shp$' , z ) ]

# read the shapefile into working memory
cbsa.map <- readOGR( shapefile, layer="tl_2010_us_cbsa10" )

# remove CBSAs ending with alaska, hawaii, and puerto rico
cbsa.map <- cbsa.map[ !grepl( "AK$|HI$|PR$" , cbsa.map$NAME10 ) , ]

# cbsa.map$NAME10 now has a length of 933
length( cbsa.map$NAME10 )

# extract centroid for each CBSA
cbsa.centroids <- data.frame(coordinates(cbsa.map), cbsa.map$GEOID10)
names(cbsa.centroids) <- c("lon","lat","GEOID10")

# add lat lon to popualtion data
nrow(x)
x <- merge(x, cbsa.centroids, by="GEOID10")
nrow(x) # centroids could not be assigned to all records for some reason

# create a raster object
r <- raster(nrow=500, ncol=500, 
            xmn=bbox(m)["x","min"], xmx=bbox(m)["x","max"],
            ymn=bbox(m)["y","min"], ymx=bbox(m)["y","max"],
            crs=proj4string(m))

# run inverse distance weighted model - modified code from ?interpolate...needs more research
model <- gstat(id = "trinary", formula = trinary~1, weights = "weight", locations = ~lon+lat, data = x,
               nmax = 7, set=list(idp = 0.5))
r <- interpolate(r, model, xyNames=c("lon","lat"))
r <- mask(r, m) # discard interpolated values outside the states

# project map for plotting (optional)
# North America Lambert Conformal Conic
nalcc <- CRS("+proj=lcc +lat_1=20 +lat_2=60 +lat_0=40 +lon_0=-96 +x_0=0 +y_0=0 +ellps=GRS80 +datum=NAD83 +units=m +no_defs")
m <- spTransform(m, nalcc)
r <- projectRaster(r, crs=nalcc)

# plot map
par(mar=c(0,0,0,0), bty="n")
cols <- c(rgb(0.9,0.8,0.8), rgb(0.9,0.4,0.3),
          rgb(0.8,0.8,0.9), rgb(0.4,0.6,0.9),
          rgb(0.8,0.9,0.8), rgb(0.4,0.9,0.6))
col.ramp <- colorRampPalette(cols) # custom colour ramp
plot(r, axes=FALSE, legend=FALSE, col=col.ramp(100))
plot(m, add=TRUE) # overlay base map
legend("right", pch=22, pt.bg=cols[c(2,4,6)], legend=c(0,1,2), bty="n")
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

如何使用 R 中的加权(调查)数据制作漂亮的无边界地理专题/热图,可能对点观测使用空间平滑 的相关文章

随机推荐