Hive doc

2023-11-03

https://cwiki.apache.org/confluence/display/Hive/GettingStarted

Table of Contents

DISCLAIMER: Hive has only been tested on unix(linux) and mac systems using Java 1.6 for now -
although it may very well work on other similar platforms. It does not work on Cygwin.
Most of our testing has been on Hadoop 0.20 - so we advise running it against this version even though it may compile/work against other versions

Hive introduction videos From Cloudera

Hive Introduction Video

Hive Tutorial Video

Installation and Configuration

Requirements

  • Java 1.6
  • Hadoop 0.20.x.

    Installing Hive from a Stable Release

Start by downloading the most recent stable release of Hive from one of the Apache download mirrors (see Hive Releases).

Next you need to unpack the tarball. This will result in the creation of a subdirectory named hive-x.y.z:

  $ tar -xzvf hive-x.y.z.tar.gz

Set the environment variable HIVE_HOME to point to the installation directory:

  $ cd hive-x.y.z
  $ export HIVE_HOME={{pwd}}

Finally, add $HIVE_HOME/bin to your PATH:

  $ export PATH=$HIVE_HOME/bin:$PATH

Building Hive from Source

The Hive SVN repository is located here: http://svn.apache.org/repos/asf/hive/trunk

  $ svn co http://svn.apache.org/repos/asf/hive/trunk hive
  $ cd hive
  $ ant clean package
  $ cd build/dist
  $ ls
  README.txt
  bin/ (all the shell scripts)
  lib/ (required jar files)
  conf/ (configuration files)
  examples/ (sample input and query files)

In the rest of the page, we use build/dist and <install-dir> interchangeably.

Running Hive

Hive uses hadoop that means:

  • you must have hadoop in your path OR
  • export HADOOP_HOME=<hadoop-install-dir>

In addition, you must create /tmp and /user/hive/warehouse
(aka hive.metastore.warehouse.dir) and set them chmod g+w in
HDFS before a table can be created in Hive.

Commands to perform this setup

  $ $HADOOP_HOME/bin/hadoop fs -mkdir       /tmp
  $ $HADOOP_HOME/bin/hadoop fs -mkdir       /user/hive/warehouse
  $ $HADOOP_HOME/bin/hadoop fs -chmod g+w   /tmp
  $ $HADOOP_HOME/bin/hadoop fs -chmod g+w   /user/hive/warehouse

I also find it useful but not necessary to set HIVE_HOME

  $ export HIVE_HOME=<hive-install-dir>

To use hive command line interface (cli) from the shell:

  $ $HIVE_HOME/bin/hive

Configuration management overview

  • Hive default configuration is stored in <install-dir>/conf/hive-default.xml
    Configuration variables can be changed by (re-)defining them in <install-dir>/conf/hive-site.xml
  • The location of the Hive configuration directory can be changed by setting the HIVE_CONF_DIR environment variable.
  • Log4j configuration is stored in <install-dir>/conf/hive-log4j.properties
  • Hive configuration is an overlay on top of hadoop - meaning the hadoop configuration variables are inherited by default.
  • Hive configuration can be manipulated by:
  • Editing hive-site.xml and defining any desired variables (including hadoop variables) in it
  • From the cli using the set command (see below)
  • By invoking hive using the syntax:
    • $ bin/hive -hiveconf x1=y1 -hiveconf x2=y2
      this sets the variables x1 and x2 to y1 and y2 respectively
  • By setting the HIVE_OPTS environment variable to "-hiveconf x1=y1 -hiveconf x2=y2" which does the same as above

    Runtime configuration

  • Hive queries are executed using map-reduce queries and, therefore, the behavior
    of such queries can be controlled by the hadoop configuration variables.
  • The cli command 'SET' can be used to set any hadoop (or hive) configuration variable. For example:
    hive> SET mapred.job.tracker=myhost.mycompany.com:50030;
    hive> SET -v;

The latter shows all the current settings. Without the -v option only the
variables that differ from the base hadoop configuration are displayed

Hive, Map-Reduce and Local-Mode

Hive compiler generates map-reduce jobs for most queries. These jobs are then submitted to the Map-Reduce cluster indicated by the variable:

  mapred.job.tracker

While this usually points to a map-reduce cluster with multiple nodes, Hadoop also offers a nifty option to run map-reduce jobs locally on the user's workstation. This can be very useful to run queries over small data sets - in such cases local mode execution is usually significantly faster than submitting jobs to a large cluster. Data is accessed transparently from HDFS. Conversely, local mode only runs with one reducer and can be very slow processing larger data sets.

Starting v-0.7, Hive fully supports local mode execution. To enable this, the user can enable the following option:

  hive> SET mapred.job.tracker=local;

In addition, mapred.local.dir should point to a path that's valid on the local machine (for example /tmp/<username>/mapred/local). (Otherwise, the user will get an exception allocating local disk space).

Starting v-0.7, Hive also supports a mode to run map-reduce jobs in local-mode automatically. The relevant options are:

  hive> SET hive.exec.mode.local.auto=false;

note that this feature is disabled by default. If enabled - Hive analyzes the size of each map-reduce job in a query and may run it locally if the following thresholds are satisfied:

  • The total input size of the job is lower than: hive.exec.mode.local.auto.inputbytes.max (128MB by default)
  • The total number of map-tasks is less than: hive.exec.mode.local.auto.tasks.max (4 by default)
  • The total number of reduce tasks required is 1 or 0.

So for queries over small data sets, or for queries with multiple map-reduce jobs where the input to subsequent jobs is substantially smaller (because of reduction/filtering in the prior job), jobs may be run locally.

Note that there may be differences in the runtime environment of hadoop server nodes and the machine running the hive client (because of different jvm versions or different software libraries). This can cause unexpected behavior/errors while running in local mode. Also note that local mode execution is done in a separate, child jvm (of the hive client). If the user so wishes, the maximum amount of memory for this child jvm can be controlled via the option hive.mapred.local.mem. By default, it's set to zero, in which case Hive lets Hadoop determine the default memory limits of the child jvm.

Error Logs

Hive uses log4j for logging. By default logs are not emitted to the
console by the CLI. The default logging level is WARN and the logs are stored in the folder:

  • /tmp/<user.name>/hive.log

If the user wishes - the logs can be emitted to the console by adding
the arguments shown below:

  • bin/hive -hiveconf hive.root.logger=INFO,console

Alternatively, the user can change the logging level only by using:

  • bin/hive -hiveconf hive.root.logger=INFO,DRFA

Note that setting hive.root.logger via the 'set' command does not
change logging properties since they are determined at initialization time.

Hive also stores query logs on a per hive session basis in /tmp/<user.name>/, but can be configured in hive-site.xml with the hive.querylog.location property.

Logging during Hive execution on a Hadoop cluster is controlled by Hadoop configuration. Usually Hadoop will produce one log file per map and reduce task stored on the cluster machine(s) where the task was executed. The log files can be obtained by clicking through to the Task Details page from the Hadoop JobTracker Web UI.

When using local mode (using mapred.job.tracker=local), Hadoop/Hive execution logs are produced on the client machine itself. Starting v-0.6 - Hive uses the hive-exec-log4j.properties (falling back to hive-log4j.properties only if it's missing) to determine where these logs are delivered by default. The default configuration file produces one log file per query executed in local mode and stores it under /tmp/<user.name>. The intent of providing a separate configuration file is to enable administrators to centralize execution log capture if desired (on a NFS file server for example). Execution logs are invaluable for debugging run-time errors.

Error logs are very useful to debug problems. Please send them with any bugs (of which there are many!) to hive-dev@hadoop.apache.org.

DDL Operations

Creating Hive tables and browsing through them

  hive> CREATE TABLE pokes (foo INT, bar STRING);

Creates a table called pokes with two columns, the first being an integer and the other a string

  hive> CREATE TABLE invites (foo INT, bar STRING) PARTITIONED BY (ds STRING);

Creates a table called invites with two columns and a partition column
called ds. The partition column is a virtual column. It is not part
of the data itself but is derived from the partition that a
particular dataset is loaded into.

By default, tables are assumed to be of text input format and the
delimiters are assumed to be ^A(ctrl-a).

  hive> SHOW TABLES;

lists all the tables

  hive> SHOW TABLES '.*s';

lists all the table that end with 's'. The pattern matching follows Java regular
expressions. Check out this link for documentation http://java.sun.com/javase/6/docs/api/java/util/regex/Pattern.html

hive> DESCRIBE invites;

shows the list of columns

As for altering tables, table names can be changed and additional columns can be dropped:

  hive> ALTER TABLE pokes ADD COLUMNS (new_col INT);
  hive> ALTER TABLE invites ADD COLUMNS (new_col2 INT COMMENT 'a comment');
  hive> ALTER TABLE events RENAME TO 3koobecaf;

Dropping tables:

  hive> DROP TABLE pokes;

Metadata Store

Metadata is in an embedded Derby database whose disk storage location is determined by the
hive configuration variable named javax.jdo.option.ConnectionURL. By default
(see conf/hive-default.xml), this location is ./metastore_db

Right now, in the default configuration, this metadata can only be seen by
one user at a time.

Metastore can be stored in any database that is supported by JPOX. The
location and the type of the RDBMS can be controlled by the two variables
javax.jdo.option.ConnectionURL and javax.jdo.option.ConnectionDriverName.
Refer to JDO (or JPOX) documentation for more details on supported databases.
The database schema is defined in JDO metadata annotations file package.jdo
at src/contrib/hive/metastore/src/model.

In the future, the metastore itself can be a standalone server.

If you want to run the metastore as a network server so it can be accessed
from multiple nodes try HiveDerbyServerMode.

DML Operations

Loading data from flat files into Hive:

  hive> LOAD DATA LOCAL INPATH './examples/files/kv1.txt' OVERWRITE INTO TABLE pokes;

Loads a file that contains two columns separated by ctrl-a into pokes table.
'local' signifies that the input file is on the local file system. If 'local'
is omitted then it looks for the file in HDFS.

The keyword 'overwrite' signifies that existing data in the table is deleted.
If the 'overwrite' keyword is omitted, data files are appended to existing data sets.

NOTES:

  • NO verification of data against the schema is performed by the load command.
  • If the file is in hdfs, it is moved into the Hive-controlled file system namespace.
    The root of the Hive directory is specified by the option hive.metastore.warehouse.dir
    in hive-default.xml. We advise users to create this directory before
    trying to create tables via Hive.
  hive> LOAD DATA LOCAL INPATH './examples/files/kv2.txt' OVERWRITE INTO TABLE invites PARTITION (ds='2008-08-15');
  hive> LOAD DATA LOCAL INPATH './examples/files/kv3.txt' OVERWRITE INTO TABLE invites PARTITION (ds='2008-08-08');

The two LOAD statements above load data into two different partitions of the table
invites. Table invites must be created as partitioned by the key ds for this to succeed.

  hive> LOAD DATA INPATH '/user/myname/kv2.txt' OVERWRITE INTO TABLE invites PARTITION (ds='2008-08-15');

The above command will load data from an HDFS file/directory to the table.
Note that loading data from HDFS will result in moving the file/directory. As a result, the operation is almost instantaneous.

SQL Operations

Example Queries

Some example queries are shown below. They are available in build/dist/examples/queries.
More are available in the hive sources at ql/src/test/queries/positive

SELECTS and FILTERS

  hive> SELECT a.foo FROM invites a WHERE a.ds='2008-08-15';

selects column 'foo' from all rows of partition ds=2008-08-15 of the invites table. The results are not
stored anywhere, but are displayed on the console.

Note that in all the examples that follow, INSERT (into a hive table, local
directory or HDFS directory) is optional.

  hive> INSERT OVERWRITE DIRECTORY '/tmp/hdfs_out' SELECT a.* FROM invites a WHERE a.ds='2008-08-15';

selects all rows from partition ds=2008-08-15 of the invites table into an HDFS directory. The result data
is in files (depending on the number of mappers) in that directory.
NOTE: partition columns if any are selected by the use of *. They can also
be specified in the projection clauses.

Partitioned tables must always have a partition selected in the WHERE clause of the statement.

  hive> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/local_out' SELECT a.* FROM pokes a;

Selects all rows from pokes table into a local directory

  hive> INSERT OVERWRITE TABLE events SELECT a.* FROM profiles a;
  hive> INSERT OVERWRITE TABLE events SELECT a.* FROM profiles a WHERE a.key < 100;
  hive> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/reg_3' SELECT a.* FROM events a;
  hive> INSERT OVERWRITE DIRECTORY '/tmp/reg_4' select a.invites, a.pokes FROM profiles a;
  hive> INSERT OVERWRITE DIRECTORY '/tmp/reg_5' SELECT COUNT(*) FROM invites a WHERE a.ds='2008-08-15';
  hive> INSERT OVERWRITE DIRECTORY '/tmp/reg_5' SELECT a.foo, a.bar FROM invites a;
  hive> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/sum' SELECT SUM(a.pc) FROM pc1 a;

Sum of a column. avg, min, max can also be used. Note that for versions of Hive which don't include HIVE-287, you'll need to use COUNT(1) in place of COUNT(*).

GROUP BY

  hive> FROM invites a INSERT OVERWRITE TABLE events SELECT a.bar, count(*) WHERE a.foo > 0 GROUP BY a.bar;
  hive> INSERT OVERWRITE TABLE events SELECT a.bar, count(*) FROM invites a WHERE a.foo > 0 GROUP BY a.bar;

Note that for versions of Hive which don't include HIVE-287, you'll need to use COUNT(1) in place of COUNT(*).

JOIN

  hive> FROM pokes t1 JOIN invites t2 ON (t1.bar = t2.bar) INSERT OVERWRITE TABLE events SELECT t1.bar, t1.foo, t2.foo;

MULTITABLE INSERT

  FROM src
  INSERT OVERWRITE TABLE dest1 SELECT src.* WHERE src.key < 100
  INSERT OVERWRITE TABLE dest2 SELECT src.key, src.value WHERE src.key >= 100 and src.key < 200
  INSERT OVERWRITE TABLE dest3 PARTITION(ds='2008-04-08', hr='12') SELECT src.key WHERE src.key >= 200 and src.key < 300
  INSERT OVERWRITE LOCAL DIRECTORY '/tmp/dest4.out' SELECT src.value WHERE src.key >= 300;

STREAMING

  hive> FROM invites a INSERT OVERWRITE TABLE events SELECT TRANSFORM(a.foo, a.bar) AS (oof, rab) USING '/bin/cat' WHERE a.ds > '2008-08-09';

This streams the data in the map phase through the script /bin/cat (like hadoop streaming).
Similarly - streaming can be used on the reduce side (please see the Hive Tutorial for examples)

Simple Example Use Cases

MovieLens User Ratings

First, create a table with tab-delimited text file format:

CREATE TABLE u_data (
  userid INT,
  movieid INT,
  rating INT,
  unixtime STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE;

Then, download and extract the data files:

wget http://www.grouplens.org/system/files/ml-data.tar+0.gz
tar xvzf ml-data.tar+0.gz

And load it into the table that was just created:

LOAD DATA LOCAL INPATH 'ml-data/u.data'
OVERWRITE INTO TABLE u_data;

Count the number of rows in table u_data:

SELECT COUNT(*) FROM u_data;

Note that for versions of Hive which don't include HIVE-287, you'll need to use COUNT(1) in place of COUNT(*).

Now we can do some complex data analysis on the table u_data:

Create weekday_mapper.py:

import sys
import datetime

for line in sys.stdin:
  line = line.strip()
  userid, movieid, rating, unixtime = line.split('\t')
  weekday = datetime.datetime.fromtimestamp(float(unixtime)).isoweekday()
  print '\t'.join([userid, movieid, rating, str(weekday)])

Use the mapper script:

CREATE TABLE u_data_new (
  userid INT,
  movieid INT,
  rating INT,
  weekday INT)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t';

add FILE weekday_mapper.py;

INSERT OVERWRITE TABLE u_data_new
SELECT
  TRANSFORM (userid, movieid, rating, unixtime)
  USING 'python weekday_mapper.py'
  AS (userid, movieid, rating, weekday)
FROM u_data;

SELECT weekday, COUNT(*)
FROM u_data_new
GROUP BY weekday;

Note that if you're using Hive 0.5.0 or earlier you will need to use COUNT(1) in place of COUNT(*).

Apache Weblog Data

The format of Apache weblog is customizable, while most webmasters uses the default.
For default Apache weblog, we can create a table with the following command.

More about !RegexSerDe can be found here: http://issues.apache.org/jira/browse/HIVE-662

add jar ../build/contrib/hive_contrib.jar;

CREATE TABLE apachelog (
  host STRING,
  identity STRING,
  user STRING,
  time STRING,
  request STRING,
  status STRING,
  size STRING,
  referer STRING,
  agent STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
  "input.regex" = "([^]*) ([^]*) ([^]*) (-|\\[^\\]*\\]) ([^ \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*)(?: ([^ \"]*|\".*\") ([^ \"]*|\".*\"))?",
  "output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s"
)
STORED AS TEXTFILE;


本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

Hive doc 的相关文章

  • 如何从Python列表中的字符串中删除双引号?

    我正在尝试在字典列表中获取一些数据 数据来自 csv 文件 因此都是字符串 文件中的键都有双引号 但由于这些都是字符串 我想删除它们 这样它们在字典中看起来像这样 key value 而不是这个 key value 我尝试简单地使用 str
  • 如何快速防止标签中出现孤儿?

    我有一个可以有一两行的标签 如果它有两行 我希望第二行至少有两个 或者可能三个 单词 而不仅仅是一个 关于如何使用 swift 实现这一点有什么想法吗 提前致谢 Daniel 编辑 我删除了我愚蠢的第一个想法 这些想法并没有真正的帮助 好吧
  • Mysql获取特定表的最后一个id

    我必须从特定的插入表中获取最后的插入 ID 可以说我有这个代码 INSERT INTO blahblah test1 test 2 VALUES test1 test2 INSERT INTO blahblah2 test1 test 2
  • 在 Swift 中将动态 Int 变量从一个类传递到另一个类

    我是 swift 2 的新手 我陷入了将变量从一个类传递到另一个类的过程中 我有一个类 GameScene 我有一个公共变量score并且在更新功能中不断更新 我想发送score两个节点相互碰撞时的值 一旦它发生碰撞 我就会使用 mainv
  • 如何在字符串vba中包含引号

    我想存储以下文本 Test1 Monday Test Abcdef 全部在字符串中包含引号 我知道要在字符串中包含引号 我必须包含 之前 但在这里这不是一个很好的解决方案 因为我在文本中有太多这样的解决方案 知道如何一次完成这一切吗 您有两
  • Erlang:如何将原子转换为字符串?

    我想从原子转换为字符串 Input hello world Output hello world 我该如何实现这一目标 Use atom to list http erlang org doc man erlang html atom to
  • 将 Java 字符串转换为 sql.Timestamp

    收到以下格式的字符串 YYYY MM DD HH MM SS NNNNNN 时间戳来自 DB2 数据库 我需要将其解析为 java sql Timestamp 并且不丢失任何精度 到目前为止 我一直无法找到现有的代码来解析远至微秒的数据 S
  • Swift:检查 UISearchBar.text 是否包含 url

    如何检查 UISearchBar text 是否包含 URL 我想做这样的事情 if searchBar text NSTextCheckingType Link 但我收到错误 String is not convertible to NS
  • 为什么在 C# 中使用 String.Concat()?

    我想知道这个问题有一段时间了 为什么使用String Concat 而不是使用 操作员 我明白了String Format因为它是一个空洞使用 运算符并使您的代码看起来更好 例如 string one bob string two jim
  • 十六进制字符串的运行长度编码(包括换行符)

    我正在使用以下方法实现游程长度编码GZipStreamC winforms 应用程序中的类 数据以一系列由换行符分隔的字符串形式提供 如下所示 FFFFFFFF FFFFFEFF FDFFFFFF 00FFFFFF 在压缩之前 我将字符串转
  • 拆分/标记化/扫描字符串并注意引号

    Java中是否有默认 简单的方法来分割字符串 但要注意引号或其他符号 例如 给定以下文本 There s a man that live next door in my neighborhood and he gets me down Ob
  • 两个 pandas 列的字符串连接

    我有一个关注者DataFrame from pandas import df DataFrame foo a b c bar 1 2 3 它看起来像这样 bar foo 0 1 a 1 2 b 2 3 c 现在我想要这样的东西 bar 0
  • 如何在EditText中显示格式化文本?

    现在我正在编写简单的笔记应用程序 我需要在 EditText 中显示格式化的单独选定文本 I tried EditText et EditText findViewById R id edittext String string int s
  • 奇怪的跨线程 UI 错误

    我正在编写一个 WinForms 应用程序 它有两种模式 控制台或 GUI 同一解决方案中的三个项目 一个用于控制台应用程序 一个用于 UI 表单 第三个用于保存两个界面也将连接的逻辑 控制台应用程序运行绝对流畅 保存用户选择的模型 它有一
  • C 支持原始字符串吗?

    C 11 添加了对原始字符串文字的支持 例如 R foo A weird string foo C有这样的东西吗 如果有 标准是什么版本 C11 如果没有 有谁知道它是否正在计划中以及是否有编译器支持它 C有这样的东西吗 如果有 标准是什么
  • 将字符串连接到python列表中所有元素的末尾

    我想知道如何将字符串连接到列表中所有元素的末尾 例如 List1 1 2 3 string a output 1a 2a 3a 在列表理解和使用中重建列表str format在两个参数上 gt gt gt string a gt gt gt
  • 将数据从 oracle 移动到 HDFS,处理并从 HDFS 移动到 Teradata

    我的要求是 将数据从 Oracle 移至 HDFS 处理HDFS上的数据 将处理后的数据移至 Teradata 还需要每 15 分钟执行一次整个处理 源数据量可能接近50GB 处理后的数据也可能相同 在网上搜索了很多之后 我发现 PRARO
  • C 中的指针、数组、字符串和 Malloc

    我目前正在学习 C 语言中的字符串 指针和数组 我尝试编写一个程序 其中数组保存三个指向字符串地址的指针 这一切似乎都有效 但程序的行为很奇怪 这是代码 char getUserDetails char host localhost cha
  • PLSql 返回值

    我再次使用一些 PLSql 我想知道 是否有任何方法可以像选择一样使用以下函数 而不必将其转换为函数或过程 这样我就可以从包含它的脚本中看到代码 代码如下 DECLARE outpt VARCHAR2 1000 flow rI VARCHA
  • 如何衡量两个字符串之间的相似度? [关闭]

    Closed 这个问题需要多问focused help closed questions 目前不接受答案 给定两个字符串text1 and text2 public SOMEUSABLERETURNTYPE Compare string t

随机推荐

  • 前端知识整理

    1 闭包 JS 中的闭包是什么 1 什么是闭包 函数和函数内部可以访问的变量组合一起就叫做闭包 闭包
  • 量化商品,计算相似度

    量化商品 计算相似度 如何精准寻找竞品 相似度算法 数据源及具体实现 思考与应用 如何精准寻找竞品 一般情况下 竞品的识别方式 是通过价格 规格 成分 主要宣传点 销量等维度进行判断的 然而这种方式存在一定滞后性 并且过于主观 同时没有监控
  • 关于Java虚拟机运行时数据区域的总结

    Java虚拟机运行时数据区域 程序计数器 Program Counter 程序计数器作为一个概念模型 这个是用来指示下一条需要执行的字节码指令在哪 Java的多线程实际上是通过线程轮转做到的 如果是一个单核的机器 或单cpu 严格意义上在一
  • ubuntu安装多个Cuda、Cuda软链接修改

    每次装环境 都得百度 索性自己写个参考吧 配置深度学习环境三部曲 1 Nvidia驱动安装 推荐高一些的版本 但别最高版本 难度 难搞 2 Cuda CuDNN安装 按需求来 3 Anaconda 本文主要介绍Cuda和CuDNN的安装 会
  • 大数据学习必须掌握的五大核心技术有哪些?

    大数据技术的体系庞大且复杂 基础的技术包含数据的采集 数据预处理 分布式存储 NoSQL数据库 数据仓库 机器学习 并行计算 可视化等各种技术范畴和不同的技术层面 首先给出一个通用化的大数据处理框架 主要分为下面几个方面 数据采集与预处理
  • typora图片上传解决办法(可用)

    作者 杂食老翟 链接 https www zhihu com question 501075370 answer 2683954521 来源 知乎 著作权归作者所有 商业转载请联系作者获得授权 非商业转载请注明出处 Typora是一款非常好
  • 进程间通信——System V IPC 之进程信号量

    51 1 进程信号量 51 1 1 信号量 本质上就是共享资源的数目 用来控制对共享资源的访问 用于进程间的互斥和同步 每种共享资源对应一个信号量 为了便于大量共享资源的操作引入了信号量集 可对所有信号量一次性操作 对信号量集中所有操作可以
  • COLMAP生成MVSNet数据集

    一 colmap2mvsnet py COLMAP可以给图像数据集标定一套相机外参及视图选择 如果想用COLMAP导出的结果输入MVSNet测试 需要把数据集 图片 相机参数等 转化为MVSNet的输入格式 MVSNet的作者yaoyao在
  • 【Java笔记+踩坑】Spring基础2——IOC,DI注解开发、整合Mybatis,Junit

    导航 黑马Java笔记 踩坑汇总 JavaSE JavaWeb SSM SpringBoot 瑞吉外卖 SpringCloud SpringCloudAlibaba 黑马旅游 谷粒商城 目录 1 IOC DI配置管理第三方bean 1 1
  • ibatisNet 数据访问框架

    author skate time 2010 03 04 ibatisNet 数据访问框架 介绍欢迎来到iBATISNet Database Layer 这个框架将让你能够更好的在dotnet应用中设计和实现实体层 这个框架有两个主要的组成
  • yum简述

    1 什么是yum源 说到yum源就必须说到linux系统中特有的依赖关系问题 yum就是为了解决依赖关系而存在的 yum源就相当是一个目录项 当我们使用yum机制安装软件时 若需要安装依赖软件 则yum机制就会根据在yum源中定义好的路径查
  • 区块链 商品溯源

    商品溯源是指追踪记录商品从生产到零售的全部环节 它的实现需要产业链上下游各方共同参与 商品溯源属于一种多环节协同的综合性商业行为 集合了 IoT 技术 防伪技术 信息系统与溯源机制 今天 区块链技术作为支撑数字经济的基础设施技术 其分布式共
  • 替代Eureka,你可以试试Consul

    V xin ruyuanhadeng获得600 页原创精品文章汇总PDF 目录 1 Eureka官宣2 x版本不再开源 2 互联网大厂的基础架构 自研服务注册中心 3 中小公司的其他选择 Consul 1 Eureka官方宣布2 x不再开源
  • ucenter info:can not connect to MySQL server解决办法

    Discuz搬家之后 发现ucenter 进不去了 错误信息为 UCenter info Can not connect to MySQL server Error Access denied for user a localhost us
  • electron-vue 安装环境、构建+打包(mac和windows)这一篇就够了

    由于公司需要 下一个项目需要做CS开发 技术选型分析后 选择了electron Electron 是一个使用 JavaScript HTML 和 CSS 等 Web 技术创建原生程序的框架 研究发现 electron与vue cli3 0脚
  • git简单提交操作

    一 本地仓库操作 1 打开git命令行 先 到需要提交的目录 2 输入git init 初始化本地仓库 3 输入git add
  • mysql 索引优化实战 (续)

    分页查询优化 示例表 CREATE TABLE employees id int 11 NOT NULL AUTO INCREMENT name varchar 24 NOT NULL DEFAULT COMMENT 姓名 age int
  • 企业微信不支持在当前APP中打开该小程序,可前往微信打开

    最近因为业务需求 需要将原来的小程序添加到企业微信 在本地测试的时候一切正常 本以为这周任务做完了 结果遇到一个深坑 到处查资料 找了很久在开发者论坛发现有人遇到过类似的问题 这里记录一下 根据 企业微信官方文档在关联了微信小程序后 可以从
  • 传输层协议------TCP协议

    这里写目录标题 协议段格式 确认应答机制 超时重传机制 协议特性 面向连接 三次握手建立连接和四次挥手断开连接 理解TIME WAIT的状态 解决TIME WAIT状态引起bind失败的方法 理解CLOSE WAIT的状态 保活机制 可靠传
  • Hive doc

    https cwiki apache org confluence display Hive GettingStarted Table of Contents Hive introduction videos From Cloudera I