列向量和派生位向量的加权和 - 版本 2

2024-02-27

我们有两个买家的出价和规模表。竞价p有尺寸s意味着买家愿意购买s价格中的产品数量p。我们有一个表,其中包含几列(如时间戳、有效性标志)以及这四列:

  • 两位买家提供的出价,pA and pB.
  • 投标尺寸,sA and sB.

我们的工作是添加一个新的最佳尺寸列(bS) 到表中,返回最佳价格的尺寸。如果两个买家的价格相同bS等于sA + sB,否则,我们需要采用提供更高价格的买家的出价大小。

An example table (ignoring columns that are neither prices nor sizes) with the desired output is below. enter image description here

一个简单的问题解决方案:

SELECT *,
  CASE
    WHEN pA = pB THEN sA + sB
    WHEN pA > pB THEN sA
    ELSE sB
  END AS bS
FROM t

现在让我们把这个问题概括给四位买家。标准的 SQL 解决方案是

WITH t_ext AS (
SELECT *, GREATEST(pA, pB, pC, pD) as bP
FROM `t` 
)
SELECT *, (sA * CAST(pA = bP AS INT64) + 
           sB * CAST(pB = bP AS INT64) + 
           sC * CAST(pC = bP AS INT64) +
           sD * CAST(pD = bP AS INT64)) 
AS bS FROM t_ext

问题:

有没有一个简化的查询

  • 使用函数 SUM 而不是手动添加四个项目
  • 避免重复铸造?

请注意,我们无法通过索引来识别价格和尺寸列,但是仅按名称。否则,我们可以使用提出的解决方案

列向量和派生位向量的加权和 https://stackoverflow.com/questions/56741959/weighted-sum-of-a-column-vector-and-a-derived-bit-vector/

顺便提一句。我写了一个博客文章 https://www.linkedin.com/pulse/data-analysis-example-python-q-ferenc-bodon-ph-d-/关于这个问题,重点关注Python和Q中的解决方案,我想知道标准sql中的最佳解决方案是什么样的。


以下是 BigQuery 标准 SQL

请注意,我们无法通过索引来识别价格和尺寸列,但是仅按名称

#standardSQL
WITH t_ext AS (
  SELECT * EXCEPT(arr), 
    ARRAY(SELECT CAST(val AS INT64) FROM UNNEST(arr) val WITH OFFSET WHERE OFFSET < ARRAY_LENGTH(arr) / 2) AS prices,
    ARRAY(SELECT CAST(val AS INT64) FROM UNNEST(arr) val WITH OFFSET WHERE OFFSET >= ARRAY_LENGTH(arr) / 2) AS sizes,
    (SELECT MAX(CAST(val AS INT64)) FROM UNNEST(arr) val WITH OFFSET WHERE OFFSET < ARRAY_LENGTH(arr) / 2) AS bestPrice
  FROM (
    SELECT *, REGEXP_EXTRACT_ALL(TO_JSON_STRING(T), r'(?:"(?:pA|pB|pC|pD|sA|sB|sC|sD)"):(\d+)') AS arr
    FROM `project.dataset.table` t
  )
)
SELECT * EXCEPT(prices, sizes), 
  (SELECT SUM(size)
    FROM UNNEST(prices) price WITH OFFSET
    JOIN UNNEST(sizes) size WITH OFFSET
    USING(OFFSET) 
    WHERE price = bestPrice
  ) AS bS
FROM t_ext

正如您所看到的 - 您唯一应该提供的是价格和尺寸列名称列表,如下例所示

pA|pB|pC|pD|sA|sB|sC|sD    

如果应用到虚拟数据如下

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'a' id, 1 pA, 2 pB, 3 pC, 4 pD, 'x' extra_col1, 1 sA, 1 sB, 1 sC, 5 sD UNION ALL
  SELECT 'b', 1, 4, 2, 4, 'y', 1, 6, 1, 5 UNION ALL
  SELECT 'c', 5, 4, 2, 1, 'z', 7, 1, 1, 1
), t_ext AS (
  SELECT * EXCEPT(arr), 
    ARRAY(SELECT CAST(val AS INT64) FROM UNNEST(arr) val WITH OFFSET WHERE OFFSET < ARRAY_LENGTH(arr) / 2) AS prices,
    ARRAY(SELECT CAST(val AS INT64) FROM UNNEST(arr) val WITH OFFSET WHERE OFFSET >= ARRAY_LENGTH(arr) / 2) AS sizes,
    (SELECT MAX(CAST(val AS INT64)) FROM UNNEST(arr) val WITH OFFSET WHERE OFFSET < ARRAY_LENGTH(arr) / 2) AS bestPrice
  FROM (
    SELECT *, REGEXP_EXTRACT_ALL(TO_JSON_STRING(T), r'(?:"(?:pA|pB|pC|pD|sA|sB|sC|sD)"):(\d+)') AS arr
    FROM `project.dataset.table` t
  )
)
SELECT * EXCEPT(prices, sizes), 
  (SELECT SUM(size)
    FROM UNNEST(prices) price WITH OFFSET
    JOIN UNNEST(sizes) size WITH OFFSET
    USING(OFFSET) 
    WHERE price = bestPrice
  ) AS bS
FROM t_ext

结果是

Row id  pA  pB  pC  pD  extra_col1  sA  sB  sC  sD  bestPrice   bS   
1   a   1   2   3   4   x           1   1   1   5   4           5    
2   b   1   4   2   4   y           1   6   1   5   4           11   
3   c   5   4   2   1   z           7   1   1   1   5           7      

希望,这就是您正在寻找的

本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

列向量和派生位向量的加权和 - 版本 2 的相关文章

随机推荐