mtb2_VizQL_Cleaning_Regx phone email_CRISP-DM_pdf table to text then to Excel

2023-11-19

Changing field attribution

 Let us look at the World Happiness Report. We create the following worksheet by placing

  • Start Performance Recording
  • AVG(Happiness Score) and Country on the Columns and Rows shelves respectively.
  • Press Ctrl key and drag the AVG(Happiness Score) to the Label
  • AVG(Happiness Score) is, of course, treated as a measure in this case.
  • Lastly, sort the countries by their happiness score, highest to lowest.
  • Stop Performance Recording
    Wait a few minute!

  • On Query pane, right click then select copy | data
         Studying the SQL generated by VizQL to create the preceding visualization is particularly insightful:
    Command
    "
    SELECT ""Extract"". ""Country"" AS ""Country""
    FROM ""Extract"".""Extract"" ""Extract""
    GROUP BY 1
    "
    
    Command
    "
    SELECT AVG( CAST( ""Extract"". ""Happiness Rank & Happiness.Rank""
                     AS DOUBLE PRECISION OR NULL
                    )
              ) AS ""avg:Happiness Rank & Happiness.Rank:ok""
    FROM ""Extract"".""Extract"" ""Extract""
    HAVING (COUNT(1) > 0)
    "
    • Since the data source is from Extract, so the file name is Extract
    • G​oup By 1?
      SELECT account_id, open_emp_id
               ^^^^        ^^^^
                1           2
      
      FROM account
      GROUP BY 1;

      In above query GROUP BY 1 refers to the first column in select statement which is account_id

    • That means *"group by the 1st column in your select clause(Country)".
      Always use GROUP BY 1 together with ORDER BY 1.
    • The CAST() function converts a value (of any type) into the specified datatype(Here is Double Precision OR NULL).
    • With COUNT(1), there is a misconception that it counts records from the first column(here is the field Rank).
      What COUNT(1) really does is that it replaces all the records you get from query result with the value 1 and then counts the rows meaning it even replaces a NULL with 1 meaning it takes NULLs into consideration while counting.
    • Count(1) > 0, this is the condiction in HAVING SQL Statement​​​​​​

          Next, please create a second worksheet called Score/Rank to analyze the scores
relative to the ranks

  1. Start Performance Recording
  2. Column shelf: Happiness Rank (dimension and continue)
  3. Row shelf: AVG(Happiness Score)-measure(average and continue)
  4. In order to get the steps, please click on Path in the Marks field and select the second option, Step.
  5. Stop Performance Recording
    SELECT ['Happiness Report$'].[Happiness.Rank] AS [Happiness.Rank],
           AVG( ['Happiness Report$'].[Happiness.Score] ) AS [avg:Happiness.Score:ok]
    FROM [dbo].['Happiness Report$'] ['Happiness Report$']
    GROUP BY ['Happiness Report$'].[Happiness.Rank]"

    Since the data source is from Happiness Report, so the file name is Happiness Report
    dbo is the default schema in SQL Server. You can create your own schemas to allow you to better manage your object namespace.

  6. The GROUP BY clause clearly communicates that Happiness Rank is treated as a dimension because grouping is only possible on dimensions.

    The takeaway is to note that VizQL enables the analyst to change the SQL code input by changing a field from measure to dimension(in tableau) rather than the source metadata. This on-the-fly ability enables creative exploration of the data that's not possible with other tools, and avoids lengthy exercises attempting to define all possible uses for each field.

     The previous section taught us how we can manipulate data types in Tableau itself without touching the data source and its metadata itself. In the next section, we will take a closer look at table calculations.

Table calculation

     In this section, we will explore how VizQL's table calculations can be used to add data to a dashboard without adding any data to the data source.

  • Drag the Freedom to the Rows shelf ( Quick Table Calculation | Moving Average and Compute Using Table(across) )
    https://blog.csdn.net/Linli522362242/article/details/124082543
    ==>

         if there are not enough values, tableau will sum all current values/the number of current values : mv_1 =1.88 ; mv_2= (1.88 + 1.84)/2=1.86 ; mv_3= (1.88 + 1.84 + 1.843 )/3=1.855
                 mv_4= (1.84 + 1.843 +1.886 )/3=1.856, ....
                 3 = length( [ -2, -1(last index), 0(current index) ])
    similar to
    WINDOW_AVG(SUM([Freedom]), -2, 0)

  • Drag the Happiness Rank to the Columns shelf, (dimension and continue)

     In the previous example, which can be viewed by opening Sheet 4 on this chapter's workbook, note that Freedom on the vertical axis is set to Quick Table Calculation and Moving Average. Calculating a Moving Average, Running Total, or other such comparison calculations can be quite challenging to accomplish in a data source. Not only must a data architect consider what comparison calculations to include in the data source, but they must also determine the dimensions for which these calculations are relevant.

     Taking a look at the relevant portion of SQL generated by the preceding worksheet shows that the table calculation is not performed by the data source. Instead, it is performed in Tableau by the VizQL module.

Command
"SELECT ""Happiness Report_Full Data.csv"".""Happiness Rank"" AS ""Happiness Rank"",
        SUM(""Happiness Report_Full Data.csv"".""Freedom"") AS ""sum:Freedom:ok""
FROM ""TableauTemp"".""Happiness Report_Full Data#csv"" ""Happiness Report_Full Data.csv""
GROUP BY 1"

In above query GROUP BY 1 refers to the first column in select statement which is Happiness Rank OR

SELECT SUM([Happiness Report$].[Freedom]) AS [sum:Freedom:ok],
       [Happiness Report$].[Happiness.Rank] AS [Happiness.Rank]
FROM [dbo].[Happiness Report$] [Happiness Report$] 
GROUP BY ['Happiness Report$'].[Happiness.Score]

     To reiterate, nothing in the preceding call to the data source generates the moving average. Only an aggregated total is returned, and Tableau calculates the moving average with VizQL.

Data mining and knowledge discovery process models

     Data modeling, data preparation, database design, data architecture—the question that arises is, how do these and other similar terms fit together? This is no easy question to answer! Terms may be used interchangeably in some contexts and be quite distinct in others. Also, understanding the interconnectivity of any technical jargon can be challenging.

     In the data world, data mining and knowledge discovery process models attempt to consistently define terms and contextually position and define the various data sub-disciplines. Since the early 1990s, various models have been proposed.

Survey of the process models

     In the following table, we can see a comparison of blueprints for conducting a data mining project with three data processing models, all of which are used to discover patterns and relationships in data in order to help make better business decisions.

     Later on, we will see how Tableau comes into play and makes this process easier and faster for us.

     Since CRISP-DM is used by four to five times the number of people as the closest competing model (SEMMA), it is the model we will consider in this chapter. For more information, see http://www.kdnuggets.com/2014/10/crisp-dm-top-methodology-analytics-data-mining-data-science-projects.html.

     The important takeaway is that each of these models grapples[ˈɡræplez] with扭打 ; 与 ; 抓住 ; 与……搏斗(都解决了) the same problems, particularly concerning the understanding, preparing, modeling, and interpreting of data.

CRISP-DM

     Cross Industry Standard Process for Data Mining (CRISP-DM) was created between 1996 and 2000 as a result of a consortium including SPSS, Teradata, Daimler AG, NCR Corporation, and OHRA. It divides the process of data mining into six major phases, as shown in the CRISP-DM reference model in the preceding comparison table.

     This model provides a bird's-eye view of a data-mining project life cycle. The sequence of the phases are not rigid; jumping back and forth from phase to phase is allowed and expected. Data mining does not cease upon the completion of a particular project. Instead, it exists as long as the business exists, and should be constantly revisited重新审视 to answer new questions as they arise.

     In the next section, we will consider each of the six phases that comprise CRISP-DM and explore how Tableau can be used throughout the life cycle. We will particularly focus on the data preparation phase, as that is the phase encompassing data cleaning, the focus of this chapter. By considering the following steps, you will be able to understand in more detail what a full data mining process circle looks like under CRISP-DM. This framework can be used to make your workflow in Tableau more efficient by working according to an established model.

CRISP-DM phases

     In the following sections, we will briefly define each of the six CRISP-DM phases and include high-level information on how Tableau might be used.

Phase I – business understanding:

  • • This phase determines the business objectives and corresponding data mining goals. It also assesses risks, costs, and contingencies[kənˈtɪndʒənsiz], and culminates[ˈkʌlmɪneɪt](以……)结束,到达顶峰 in a project plan.
  • • Tableau is a natural fit for presenting information to enhance business understanding.

Phase II – data understanding:

  • • This phase begins with an initial data collection exercise. The data is then explored to discover early insights and identify data quality issues.
  • • Once the data is collected into one or more relational data sources, Tableau can be used to effectively explore the data and enhance data understanding. 

Phase III – data preparation:

  • • This phase includes data selection, cleaning, construction, merging, and formatting.
  • • Tableau can be effectively used to identify the preparation tasks that need to occur; that is, Tableau can be used to quickly identify the data selection, cleaning, merging, and so on, that should be addressed. Additionally, Tableau can sometimes be used to do actual data preparation. We will walk through examples in the next section. 

Phase IV – modeling:

  • • In this phase, data modeling methods and techniques are considered and implemented in one or more data sources. It is important to choose an approach that works well with Tableau; for example, as discussed in Chapter 6, All About Data – Data Densification, Cubes, and Big Data, Tableau works better with relational data sources than with cubes.
  • • Tableau has some limited data modeling capabilities, such as pivoting datasets through the data source page. 

Phase V – evaluation

  • • The evaluation phase considers the results; do they meet the business goals with which we started the data mining process? Test the model on another dataset, for example, from another day or on a production dataset, and determine whether it works as well in the workplace as it did in your tests.
  • • Tableau is an excellent fit for considering the results during this phase, as it is easy to change the input dataset as long as the metadata layer remains the same—for example, the column header stays the same.

Phase VI – deployment:

  • • This phase should begin with a carefully considered plan to ensure a smooth rollout以确保顺利推出. The plan should include ongoing monitoring and maintenance to ensure continued streamlined[ˈstriːmlaɪn]精简(工商企业、组织、流程等)使效率更高 access to quality data. Although the phase officially ends with a final report and accompanying review, the data mining process, as stated earlier, continues for the life of the business. Therefore, this phase will always lead to the previous five phases.
  • • Tableau should certainly be considered a part of the deployment phase. Not only is it an excellent vehicle for delivering end-user reporting; it can also be used to report on the data mining process itself. For instance, Tableau can be used to report on the performance of the overall data delivery system and thus be an asset for ongoing monitoring and maintenance.
  • Tableau Server is the best fit for Phase VI. We will discuss this separate Tableau product in Chapter 14, Interacting with Tableau Server/Online.

     Now that we have learned what a full data mining circle looks like (and looked like pre-Tableau) and understood that every step can be executed in Tableau, we can see how it makes sense that data people celebrate Tableau Software products. 

     Tableau makes data mining so much easier and efficient, and the replication of steps is also easier than it was before, without Tableau. In the next section, we will take a look at a practical example to explore the content we've just learned with some hands-on examples.

Focusing on data preparation

     As discussed earlier, Tableau can be used effectively throughout the CRISP-DM phases. Unfortunately, a single chapter is not sufficient to thoroughly explore how Tableau can be used in each phase. Indeed, such a thorough exploration may be worthy of an entire book! Our focus, therefore, will be directed to data preparation, since that phase has historically accounted for up to 60% of the data mining effort. Our goal will be to learn how Tableau can be used to streamline that effort.

Surveying data

     Tableau can be a very effective tool for simply surveying data. Sometimes in the survey process, you may discover ways to clean the data or populate[ˈpɑːpjuleɪt]填充 incomplete data based on existing fields. Sometimes, regretfully, there are simply not enough pieces of the puzzle to put together an entire dataset. In such cases, Tableau can be useful to communicate exactly what the gaps are, and this, in turn, may incentivize[ɪnˈsentɪvaɪz]激励 the organization to more fully populate the underlying data.

     In this exercise, we will explore how to use Tableau to quickly discover the percentage of null values for each field in a dataset. Next, we'll explore how the data might be extrapolated[ɪkˈstræpəleɪtɪd]推测 from existing fields to fill in the gaps.

Establishing null values建立空值

The following are the steps for surveying the data:

  • 2. Navigate to the worksheet entitled Surveying & Exploring Data.
  • 3. Drag Region and Country to the Rows shelf. Observe that in some cases the Region field has Null values for some countries:
  • 4. Right-click and Edit... the parameter entitled Select Field. Note that the Data Type is set to Integer and we can observe a list that contains an entry for each field name in the dataset:
  • 5. In the Data pane, right-click on the parameter we just created and select Show Parameter Control.
  • 6. Create a calculated field entitled % Populated and write the following calculation:
    SUM([Number of Records])/TOTAL( SUM([Number of Records]) )
    The code is the equivalent of the quick table calculation Percent of Total :VS
         In conjunction with the following Null & Populated calculated field, it allows us to see what percentage of our fields are actually populated with values.

         This case statement is a row-level calculation that considers each field in the dataset and determines which rows are populated and which are not.
          For example, in the representative line of the preceding code, every row of the Country field is evaluated for nulls. The reason for this is that a calculated field will add a new column to the existing data—only in Tableau, not in the data source itself—and every row will get a value. These values can be N/A or null values.
  • 7. In the Data pane, right-click on % Populated and select Default Properties | Number Format…:
  • 8. In the resulting dialog box, choose Percentage.
  • 9. Create a calculated field entitled Null & Populated空值和填充 and add the following code. Note that the complete case statement is fairly lengthy but is also repetitive.
    CASE [Select Field]
        WHEN 1 
            THEN IF ISNULL([Country])
                    THEN 'NULL Values' 
                 ELSE
                    'Populated Values'
                 END
        WHEN 2
            THEN IF ISNULL([Region])
                    THEN 'NULL Values'
                 ELSE
                    'Populated Values'
                 END
        WHEN 3
            THEN IF ISNULL([Economy (GDP per Capita)])
                    THEN 'NULL Values'
                 ELSE
                    'Populated Values'
                 END
        WHEN 4
            THEN IF ISNULL([Family])
                    THEN 'NULL Values'
                 ELSE
                    'Populated Values'
                 END
        WHEN 5
            THEN IF ISNULL([Freedom])
                    THEN 'NULL Values'
                 ELSE
                    'Populated Values'
                 END
        WHEN 6
            THEN IF ISNULL([Happiness Rank])
                    THEN 'NULL Values'
                 ELSE
                    'Populated Values'
                 END
        WHEN 7
            THEN IF ISNULL([Happiness Score])
                    THEN 'NULL Values'
                 ELSE
                    'Populated Values'
                 END
        WHEN 8
            THEN IF ISNULL([Health (Life Expectancy)])
                    THEN 'NULL Values'
                 ELSE
                    'Populated Values'
                 END
    
        WHEN 9
            THEN IF ISNULL([Standard Error])
                    THEN 'NULL Values'
                 ELSE
                    'Populated Values'
                 END
        WHEN 10 
            THEN IF ISNULL ([Region Extrapolated]) 
                    THEN 'Null Values' 
                 ELSE
                    'Populated Values'
                 END
    END

  • 10. Remove Region and Country from the Rows shelf.
  • 11. Place Null & Populated on the Rows and Color shelves and
    % Populate on the Columns and Label shelves:
  • 12. Change the colors to red for Null Values and green for Populated Values if desired. You can do so by clicking on Color in the Marks card and Edit Colors.
  • 13. Click on the arrow in the upper right corner of the Select Field parameter on your sheet and select Single Value List.
  • 14. Select various choices in the Select Field parameter and note that some fields have a high percentage of null values. For example, in the following diagram, 32.98% of records do not have a value for Region:

     Building on this exercise, let's explore how we might clean and extrapolate data from existing data using the same dataset. 

Extrapolating data

     This exercise will expand on the previous exercise by cleaning existing data and populating some of the missing data from known information. We will assume that we know which country belongs to which region. We'll use that knowledge to fix errors in the Region field and also to fill in the gaps using Tableau

  • 1. Starting from where the previous exercise ended, create a calculated field entitled Region Extrapolated with the following code block

    For example, the Region field in the dataset had a large percentage of null values, and even the existing data had errors. Based on our knowledge of the business (that is, which country belongs to which region) we were able to use the Country field to achieve 100% population of the dataset with accurate information.
CASE [Country]
WHEN 'Afghanistan' THEN 'Southern Asia'
WHEN 'Albania' THEN 'Central and Eastern Europe'
WHEN 'Algeria' THEN 'Middle East and Northern Africa'
WHEN 'Angola' THEN 'Sub-Saharan Africa'
WHEN 'Argentina' THEN 'Latin America and Caribbean'
WHEN 'Armenia' THEN 'Central and Eastern Europe'
WHEN 'Australia' THEN 'Australia and New Zealand'
WHEN 'Austria' THEN 'Western Europe'
WHEN 'Azerbaijan' THEN 'Central and Eastern Europe'
WHEN 'Bahrain' THEN 'Middle East and Northern Africa'
WHEN 'Bangladesh' THEN 'Southern Asia'
WHEN 'Belarus' THEN 'Central and Eastern Europe'
WHEN 'Belgium' THEN 'Western Europe'
WHEN 'Belize' THEN 'Latin America and Caribbean'
WHEN 'Benin' THEN 'Sub-Saharan Africa'
WHEN 'Bhutan' THEN 'Southern Asia'
WHEN 'Bolivia' THEN 'Latin America and Caribbean'
WHEN 'Bosnia and Herzegovina' THEN 'Central and Eastern Europe'
WHEN 'Botswana' THEN 'Sub-Saharan Africa'
WHEN 'Brazil' THEN 'Latin America and Caribbean'
WHEN 'Bulgaria' THEN 'Central and Eastern Europe'
WHEN 'Burkina Faso' THEN 'Sub-Saharan Africa'
WHEN 'Burundi' THEN 'Sub-Saharan Africa'
WHEN 'Cambodia' THEN 'Southeastern Asia'
WHEN 'Cameroon' THEN 'Sub-Saharan Africa'
WHEN 'Canada' THEN 'North America'
WHEN 'Central African Republic' THEN 'Sub-Saharan Africa'
WHEN 'Chad' THEN 'Sub-Saharan Africa'
WHEN 'Chile' THEN 'Latin America and Caribbean'
WHEN 'China' THEN 'Eastern Asia'
WHEN 'Colombia' THEN 'Latin America and Caribbean'
WHEN 'Comoros' THEN 'Sub-Saharan Africa'
WHEN 'Congo (Brazzaville)' THEN 'Sub-Saharan Africa'
WHEN 'Congo (Kinshasa)' THEN 'Sub-Saharan Africa'
WHEN 'Costa Rica' THEN 'Latin America and Caribbean'
WHEN 'Croatia' THEN 'Central and Eastern Europe'
WHEN 'Cyprus' THEN 'Western Europe'
WHEN 'Czech Republic' THEN 'Central and Eastern Europe'
WHEN 'Denmark' THEN 'Western Europe'
WHEN 'Djibouti' THEN 'Sub-Saharan Africa'
WHEN 'Dominican Republic' THEN 'Latin America and Caribbean'
WHEN 'Ecuador' THEN 'Latin America and Caribbean'
WHEN 'Egypt' THEN 'Middle East and Northern Africa'
WHEN 'El Salvador' THEN 'Latin America and Caribbean'
WHEN 'Estonia' THEN 'Central and Eastern Europe'
WHEN 'Ethiopia' THEN 'Sub-Saharan Africa'
WHEN 'Finland' THEN 'Western Europe'
WHEN 'France' THEN 'Western Europe'
WHEN 'Gabon' THEN 'Sub-Saharan Africa'
WHEN 'Georgia' THEN 'Central and Eastern Europe'
WHEN 'Germany' THEN 'Western Europe'
WHEN 'Ghana' THEN 'Sub-Saharan Africa'
WHEN 'Greece' THEN 'Western Europe'
WHEN 'Guatemala' THEN 'Latin America and Caribbean'
WHEN 'Guinea' THEN 'Sub-Saharan Africa'
WHEN 'Haiti' THEN 'Latin America and Caribbean'
WHEN 'Honduras' THEN 'Latin America and Caribbean'
WHEN 'Hong Kong' THEN 'Eastern Asia'
WHEN 'Hong Kong S.A.R., China' THEN 'Eastern Asia'
WHEN 'Hungary' THEN 'Central and Eastern Europe'
WHEN 'Iceland' THEN 'Western Europe'
WHEN 'India' THEN 'Southern Asia'
WHEN 'Indonesia' THEN 'Southeastern Asia'
WHEN 'Iran' THEN 'Middle East and Northern Africa'
WHEN 'Iraq' THEN 'Middle East and Northern Africa'
WHEN 'Ireland' THEN 'Western Europe'
WHEN 'Israel' THEN 'Middle East and Northern Africa'
WHEN 'Italy' THEN 'Western Europe'
WHEN 'Ivory Coast' THEN 'Sub-Saharan Africa'
WHEN 'Jamaica' THEN 'Latin America and Caribbean'
WHEN 'Japan' THEN 'Eastern Asia'
WHEN 'Jordan' THEN 'Middle East and Northern Africa'
WHEN 'Kazakhstan' THEN 'Central and Eastern Europe'
WHEN 'Kenya' THEN 'Sub-Saharan Africa'
WHEN 'Kosovo' THEN 'Central and Eastern Europe'
WHEN 'Kuwait' THEN 'Middle East and Northern Africa'
WHEN 'Kyrgyzstan' THEN 'Central and Eastern Europe'
WHEN 'Laos' THEN 'Southeastern Asia'
WHEN 'Latvia' THEN 'Central and Eastern Europe'
WHEN 'Lebanon' THEN 'Middle East and Northern Africa'
WHEN 'Lesotho' THEN 'Sub-Saharan Africa'
WHEN 'Liberia' THEN 'Sub-Saharan Africa'
WHEN 'Libya' THEN 'Middle East and Northern Africa'
WHEN 'Lithuania' THEN 'Central and Eastern Europe'
WHEN 'Luxembourg' THEN 'Western Europe'
WHEN 'Macedonia' THEN 'Central and Eastern Europe'
WHEN 'Madagascar' THEN 'Sub-Saharan Africa'
WHEN 'Malawi' THEN 'Sub-Saharan Africa'
WHEN 'Malaysia' THEN 'Southeastern Asia'
WHEN 'Mali' THEN 'Sub-Saharan Africa'
WHEN 'Malta' THEN 'Western Europe'
WHEN 'Mauritania' THEN 'Sub-Saharan Africa'
WHEN 'Mauritius' THEN 'Sub-Saharan Africa'
WHEN 'Mexico' THEN 'Latin America and Caribbean'
WHEN 'Moldova' THEN 'Central and Eastern Europe'
WHEN 'Mongolia' THEN 'Eastern Asia'
WHEN 'Montenegro' THEN 'Central and Eastern Europe'
WHEN 'Morocco' THEN 'Middle East and Northern Africa'
WHEN 'Mozambique' THEN 'Sub-Saharan Africa'
WHEN 'Myanmar' THEN 'Southeastern Asia'
WHEN 'Namibia' THEN 'Sub-Saharan Africa'
WHEN 'Nepal' THEN 'Southern Asia'
WHEN 'Netherlands' THEN 'Western Europe'
WHEN 'New Zealand' THEN 'Australia and New Zealand'
WHEN 'Nicaragua' THEN 'Latin America and Caribbean'
WHEN 'Niger' THEN 'Sub-Saharan Africa'
WHEN 'Nigeria' THEN 'Sub-Saharan Africa'
WHEN 'North Cyprus' THEN 'Western Europe'
WHEN 'Norway' THEN 'Western Europe'
WHEN 'Oman' THEN 'Middle East and Northern Africa'
WHEN 'Pakistan' THEN 'Southern Asia'
WHEN 'Palestinian Territories' THEN 'Middle East and Northern Africa'
WHEN 'Panama' THEN 'Latin America and Caribbean'
WHEN 'Paraguay' THEN 'Latin America and Caribbean'
WHEN 'Peru' THEN 'Latin America and Caribbean'
WHEN 'Philippines' THEN 'Southeastern Asia'
WHEN 'Poland' THEN 'Central and Eastern Europe'
WHEN 'Portugal' THEN 'Western Europe'
WHEN 'Puerto Rico' THEN 'Latin America and Caribbean'
WHEN 'Qatar' THEN 'Middle East and Northern Africa'
WHEN 'Romania' THEN 'Central and Eastern Europe'
WHEN 'Russia' THEN 'Central and Eastern Europe'
WHEN 'Rwanda' THEN 'Sub-Saharan Africa'
WHEN 'Saudi Arabia' THEN 'Middle East and Northern Africa'
WHEN 'Senegal' THEN 'Sub-Saharan Africa'
WHEN 'Serbia' THEN 'Central and Eastern Europe'
WHEN 'Sierra Leone' THEN 'Sub-Saharan Africa'
WHEN 'Singapore' THEN 'Southeastern Asia'
WHEN 'Slovakia' THEN 'Central and Eastern Europe'
WHEN 'Slovenia' THEN 'Central and Eastern Europe'
WHEN 'Somalia' THEN 'Sub-Saharan Africa'
WHEN 'Somaliland region' THEN 'Sub-Saharan Africa'
WHEN 'South Africa' THEN 'Sub-Saharan Africa'
WHEN 'South Korea' THEN 'Eastern Asia'
WHEN 'South Sudan' THEN 'Sub-Saharan Africa'
WHEN 'Spain' THEN 'Western Europe'
WHEN 'Sri Lanka' THEN 'Southern Asia'
WHEN 'Sudan' THEN 'Sub-Saharan Africa'
WHEN 'Suriname' THEN 'Latin America and Caribbean'
WHEN 'Swaziland' THEN 'Sub-Saharan Africa'
WHEN 'Sweden' THEN 'Western Europe'
WHEN 'Switzerland' THEN 'Western Europe'
WHEN 'Syria' THEN 'Middle East and Northern Africa'
WHEN 'Taiwan' THEN 'Eastern Asia'
WHEN 'Taiwan Province of China' THEN 'Eastern Asia'
WHEN 'Tajikistan' THEN 'Central and Eastern Europe'
WHEN 'Tanzania' THEN 'Sub-Saharan Africa'
WHEN 'Thailand' THEN 'Southeastern Asia'
WHEN 'Togo' THEN 'Sub-Saharan Africa'
WHEN 'Trinidad and Tobago' THEN 'Latin America and Caribbean'
WHEN 'Tunisia' THEN 'Middle East and Northern Africa'
WHEN 'Turkey' THEN 'Middle East and Northern Africa'
WHEN 'Turkmenistan' THEN 'Central and Eastern Europe'
WHEN 'Uganda' THEN 'Sub-Saharan Africa'
WHEN 'Ukraine' THEN 'Central and Eastern Europe'
WHEN 'United Arab Emirates' THEN 'Middle East and Northern Africa'
WHEN 'United Kingdom' THEN 'Western Europe'
WHEN 'United States' THEN 'North America'
WHEN 'Uruguay' THEN 'Latin America and Caribbean'
WHEN 'Uzbekistan' THEN 'Central and Eastern Europe'
WHEN 'Venezuela' THEN 'Latin America and Caribbean'
WHEN 'Vietnam' THEN 'Southeastern Asia'
WHEN 'Yemen' THEN 'Middle East and Northern Africa'
WHEN 'Zambia' THEN 'Sub-Saharan Africa'
WHEN 'Zimbabwe' THEN 'Sub-Saharan Africa'
END

 

     To speed up the tedious creation of a long calculated field, you could download the data to an Excel file and create the calculated field by concatenating the separate parts, as shown here:

=$B$2&" "&$C$2&A4&$C$2&" "&$D$2&" "&$C$2&B4&$C$2

=$B$2&" "&$C$2&A4&$C$2&" "&$D$2&" "&$C$2&B4&$C$2
=WHEN 'Country' THEN 'Region'

Besides, Single quotes(') in excel are to be escaped, so use 2 single quotes('') to represent a single quote
     Country

Afghanistan
Albania
Algeria
Angola
Argentina
Armenia
Australia
Austria
Azerbaijan
Bahrain
Bangladesh
Belarus
Belgium
Belize
Benin
Bhutan
Bolivia
Bosnia and Herzegovina
Botswana
Brazil
Bulgaria
Burkina Faso
Burundi
Cambodia
Cameroon
Canada
Central African Republic
Chad
Chile
China
Colombia
Comoros
Congo (Brazzaville)
Congo (Kinshasa)
Costa Rica
Croatia
Cyprus
Czech Republic
Denmark
Djibouti
Dominican Republic
Ecuador
Egypt
El Salvador
Estonia
Ethiopia
Finland
France
Gabon
Georgia
Germany
Ghana
Greece
Guatemala
Guinea
Haiti
Honduras
Hong Kong
Hong Kong S.A.R., China
Hungary
Iceland
India
Indonesia
Iran
Iraq
Ireland
Israel
Italy
Ivory Coast
Jamaica
Japan
Jordan
Kazakhstan
Kenya
Kosovo
Kuwait
Kyrgyzstan
Laos
Latvia
Lebanon
Lesotho
Liberia
Libya
Lithuania
Luxembourg
Macedonia
Madagascar
Malawi
Malaysia
Mali
Malta
Mauritania
Mauritius
Mexico
Moldova
Mongolia
Montenegro
Morocco
Mozambique
Myanmar
Namibia
Nepal
Netherlands
New Zealand
Nicaragua
Niger
Nigeria
North Cyprus
Norway
Oman
Pakistan
Palestinian Territories
Panama
Paraguay
Peru
Philippines
Poland
Portugal
Puerto Rico
Qatar
Romania
Russia
Rwanda
Saudi Arabia
Senegal
Serbia
Sierra Leone
Singapore
Slovakia
Slovenia
Somalia
Somaliland region
South Africa
South Korea
South Sudan
Spain
Sri Lanka
Sudan
Suriname
Swaziland
Sweden
Switzerland
Syria
Taiwan
Taiwan Province of China
Tajikistan
Tanzania
Thailand
Togo
Trinidad and Tobago
Tunisia
Turkey
Turkmenistan
Uganda
Ukraine
United Arab Emirates
United Kingdom
United States
Uruguay
Uzbekistan
Venezuela
Vietnam
Yemen
Zambia
Zimbabwe

     Region

Southern Asia
Central and Eastern Europe
Middle East and Northern Africa
Sub-Saharan Africa
Latin America and Caribbean
Central and Eastern Europe
Australia and New Zealand
Western Europe
Central and Eastern Europe
Middle East and Northern Africa
Southern Asia
Central and Eastern Europe
Western Europe
Latin America and Caribbean
Sub-Saharan Africa
Southern Asia
Latin America and Caribbean
Central and Eastern Europe
Sub-Saharan Africa
Latin America and Caribbean
Central and Eastern Europe
Sub-Saharan Africa
Sub-Saharan Africa
Southeastern Asia
Sub-Saharan Africa
North America
Sub-Saharan Africa
Sub-Saharan Africa
Latin America and Caribbean
Eastern Asia
Latin America and Caribbean
Sub-Saharan Africa
Sub-Saharan Africa
Sub-Saharan Africa
Latin America and Caribbean
Central and Eastern Europe
Western Europe
Central and Eastern Europe
Western Europe
Sub-Saharan Africa
Latin America and Caribbean
Latin America and Caribbean
Middle East and Northern Africa
Latin America and Caribbean
Central and Eastern Europe
Sub-Saharan Africa
Western Europe
Western Europe
Sub-Saharan Africa
Central and Eastern Europe
Western Europe
Sub-Saharan Africa
Western Europe
Latin America and Caribbean
Sub-Saharan Africa
Latin America and Caribbean
Latin America and Caribbean
Eastern Asia
Eastern Asia
Central and Eastern Europe
Western Europe
Southern Asia
Southeastern Asia
Middle East and Northern Africa
Middle East and Northern Africa
Western Europe
Middle East and Northern Africa
Western Europe
Sub-Saharan Africa
Latin America and Caribbean
Eastern Asia
Middle East and Northern Africa
Central and Eastern Europe
Sub-Saharan Africa
Central and Eastern Europe
Middle East and Northern Africa
Central and Eastern Europe
Southeastern Asia
Central and Eastern Europe
Middle East and Northern Africa
Sub-Saharan Africa
Sub-Saharan Africa
Middle East and Northern Africa
Central and Eastern Europe
Western Europe
Central and Eastern Europe
Sub-Saharan Africa
Sub-Saharan Africa
Southeastern Asia
Sub-Saharan Africa
Western Europe
Sub-Saharan Africa
Sub-Saharan Africa
Latin America and Caribbean
Central and Eastern Europe
Eastern Asia
Central and Eastern Europe
Middle East and Northern Africa
Sub-Saharan Africa
Southeastern Asia
Sub-Saharan Africa
Southern Asia
Western Europe
Australia and New Zealand
Latin America and Caribbean
Sub-Saharan Africa
Sub-Saharan Africa
Western Europe
Western Europe
Middle East and Northern Africa
Southern Asia
Middle East and Northern Africa
Latin America and Caribbean
Latin America and Caribbean
Latin America and Caribbean
Southeastern Asia
Central and Eastern Europe
Western Europe
Latin America and Caribbean
Middle East and Northern Africa
Central and Eastern Europe
Central and Eastern Europe
Sub-Saharan Africa
Middle East and Northern Africa
Sub-Saharan Africa
Central and Eastern Europe
Sub-Saharan Africa
Southeastern Asia
Central and Eastern Europe
Central and Eastern Europe
Sub-Saharan Africa
Sub-Saharan Africa
Sub-Saharan Africa
Eastern Asia
Sub-Saharan Africa
Western Europe
Southern Asia
Sub-Saharan Africa
Latin America and Caribbean
Sub-Saharan Africa
Western Europe
Western Europe
Middle East and Northern Africa
Eastern Asia
Eastern Asia
Central and Eastern Europe
Sub-Saharan Africa
Southeastern Asia
Sub-Saharan Africa
Latin America and Caribbean
Middle East and Northern Africa
Middle East and Northern Africa
Central and Eastern Europe
Sub-Saharan Africa
Central and Eastern Europe
Middle East and Northern Africa
Western Europe
North America
Latin America and Caribbean
Central and Eastern Europe
Latin America and Caribbean
Southeastern Asia
Middle East and Northern Africa
Sub-Saharan Africa
Sub-Saharan Africa

     You can then copy them(WHEN 'Country' THEN 'Region') from Excel into Tableau. However, for this exercise, I have created a backup field called Backup , which can be found in the Tableau Workbook associated with this chapter, which contains the full calculation needed for the Region Extrapolated field. Use this at your convenience. The Solutions dashboard also contains all of the countries. You can therefore copy the Region Extrapolated field from that file too.      

  • 2. Add a Region Extrapolated option to the Select Field parameter:
  • 3. Add the following code to the Null & Populated calculated field:
        WHEN 10 
            THEN IF ISNULL ([Region Extrapolated]) 
                    THEN 'Null Values' 
                 ELSE
                    '% Populated Values'
                 END

  • Note that the Region Extrapolated field is not fully populated:

     right click the percentage of Null Values(0.21%)==>View Data==>Full Data

     append the following code to Region Extrapolated:

    WHEN 'Somaliland Region' THEN 'Sub-Saharan Africa'

     Another solution : CASE LOWER([Country]) and all country name use their lower case:
    Now, that the Region Extrapolated field is fully populated:

     Nulls are a part of almost every extensive real dataset. Understanding how many nulls are present in each field can be vital to ensuring that you provide accurate business intelligence. It may be acceptable to tolerate some null values when the final results will not be substantially impacted, but too many nulls may invalidate results. However, as demonstrated here, in some cases one or more fields can be used to extrapolate the values that should be entered into an underpopulated填充不足 or erroneously[ɪˈroʊniəsli]错误地,不正确  populated field.

As demonstrated in this section, Tableau gives you the ability to effectively communicate to your data team which values are missing, which are erroneous, and how possible workarounds can be invaluable to the overall data mining effort. Next, we will look into data that is a bit messier and not in a nice column format. Don't worry, Tableau has us covered. 

Cleaning messy data 

     The United States government provides helpful documentation for various bureaucratic[ˌbjʊrəˈkrætɪk]官僚政治的  processes. For example, the Department of Health and Human Services (HSS) provides lists of ICD-9 codes, otherwise known as International Statistical Classification of Diseases and Related Health Problems codes. Unfortunately, these codes are not always in easily accessible formats. 

     As an example, let's consider an actual HHS document known as R756OTN, which can be found at https://www.cms.gov/Regulations-and-Guidance/Guidance/Transmittals/downloads/R756OTN.pdf

Convert the pdf to Excel

  • Download adobe acrobat and install it:http://www.downza.cn/soft/20562.html
  • File ==> Export To... ==> Text(Plain) ==>

  • Clean data by remove anything except 'Diagnosis Code' and 'Description'
    For example:
    Ctrl + F ==>delete
    and if you see
    you need to check pdf document,
    then process it

    Cleaned up text

  • python code
    ==>strip()==>

    filename='R756OTN.txt'
    txt=[]                            #sometime is 'utf-8'
    with open(filename, 'r', encoding='cp1252') as infile:
        for line in infile:
            line=line.strip()   # remove space
            if len( line ) ==0: # remove ''
                continue
            txt.append( line )
    
    diagnosis_code=[]
    description=[]
    
    for i in range(0,len(txt),2):
        diagnosis_code.append( txt[i] )
        description.append(txt[i+1])
    
    assert len(diagnosis_code) + len(description) == len(txt)
    print(len(txt), len(diagnosis_code) )
    
    import pandas as pd
    dict = zip(diagnosis_code, description)
    df = pd.DataFrame(dict, columns=['Diagnosis Code', 'Description'])
    df.to_csv('code_description.csv',index=False)
    
    df.head(n=10)


    better than the author's dataset file

Cleaning the data 

     Navigate to the Cleaning the Data worksheet in this workbook and execute the following steps:

  • 1. Within the Data pane, select the R756OTN Raw data source:
  • 2. Drag Diagnosis to the Rows shelf and choose Add all members. Note the junk data that occurs in some rows:
  • 3. Create a calculated field named DX with the following code:
    //                delimiter, token number        
    SPLIT( [Diagnosis], " ", 1 )
         Returns a substring from a string, as determined by the delimiter extracting the characters from the beginning or end of the string.

         This function can also be called directly in the Data Source tab when clicking on a column header and selecting Split. To extract characters from the end of the string, the token number (that is, the number at the end of the function) must be negative.

    Extract the first subcharacter (especially diagnosis code)
  • 4. Create a calculated field named Null Hunting with the following code:
    INT( MID([DX],2,1) )

         The use of MID is quite straightforward, and is much the same as the corresponding function in Excel. The use of INT in this case, however, may be confusing. Casting an alpha character with an INT function will result in Tableau returning Null . This satisfactorily fulfills our purpose, since we simply need to discover those rows not starting with an integer by locating the nulls.

         Use the second character of Diagnosis Code must be a number, otherwise return Null, to exclude useless lines (non-diagnosis Code and description)
  • 5. In the Data pane, convert Null Hunting from Measures to Dimensions, descrete.
  • 6. Drag Diagnosis, DX, and Null Hunting to the Rows shelf. Observe that Null is returned when the second character in the Diagnosis field is not numeric:
  • 7. Create a calculated field named Exclude from ICD Codes containing the following code:
    ISNULL([Null Hunting])
    ISNULL is a Boolean function that simply returns TRUE in the case of Null 
  • 8. Clear the sheet of all fields, as demonstrated in Chapter 1, Getting Up to Speed – a Review of the Basics, and set the Marks card to Shape.
  • 9. Place Exclude from ICD Codes on the Rows, Color, and Shape shelves, and then place DX on the Rows shelf. Observe the rows labeled as True:
  • 10. In order to exclude the junk data (that is, those rows where Exclude from ICD Codes equates to TRUE ), place Exclude from ICD Codes on the Filter shelf and deselect True.
  • 11. Create a calculated field named Diagnosis Text containing the following code:
    REPLACE([Diagnosis], [DX]+" ", "")

    remove Diagnosis Code + " " from the Diagnosis Code + " " + Description, the remaining is the Description(Diagnosis Text) ;This step is to extract the Description(Diagnosis Text)
  • 12. Place Diagnosis Text on the Rows shelf after DX. Also, remove Exclude from ICD Codes(step10) from the Rows shelf and the Marks Card, and set the mark type to Automatic:

     The final output for this exercise could be to export the data from Tableau as an additional source of data. This data could then be used by Tableau and other tools for future reporting needs. For example, the DX field could be useful in data blending.

     Does Tableau offer a better approach that might solve the issue of truncated data associated with the preceding solution? Yes! Let's turn our attention to the next exercise, where we will consider regular expression functions.

Extracting data

     Although, as shown in the previous exercise, Cleaning the data, the SPLIT function can be useful for cleaning data, regular expression functions are far more powerful and represent a broadening of the scope from Tableau's traditional focus on visualization and analytics to also include data cleaning capabilities. Let's look at an example that requires us to deal with some pretty messy data in Tableau. Our objective will be to extract phone numbers. 

The following are the steps:

  • 2. Select the Extracting the Data Worksheet.
  • 3.In the Data pane, select the String of Data data source and drag the String of Data field to the Rows shelf. Observe the challenges associated with extracting the phone numbers:
  • 4. Access the underlying data by clicking the View data button and copy several rows:
  • 5. Navigate to https://www.regexpal.com/ and paste the data into the pane labeled Test String; that is, the second pane:
  • 6. In the first pane (the one labeled Regular Expression), type the following:
    \([0-9]{3}\)-[0-9]{3}-[0-9]{4}
     
    • "\" : An escape character : the \ indicates that the next character should not be treated
      as special but as literal 
      For our example, we are literally looking for an open parenthesis: \(   OR \)
    • [0-9] simply declares that e are looking for numbers from 0 to 9. Alternatively, consider \d to achieve the same results.
    • The {3} designates that we are looking for three consecutive digits.
    • As with the opening parenthesis at the beginning of the pattern, the \ character designates[ˈdezɪɡneɪt]指定 the closing parentheses as a literal.
    • The - is a literal that specifically looks for a hyphen[ˈhaɪfn](连字符).
      Phone number regex:
      \({0,1}[0-9]{3}\){0,1}-{0,1}[0-9]{3}-{0,1}[0-9]{4}

      {n,m}

      m 和 n 均为非负整数,其中n <= m。最少匹配 n 次且最多匹配 m 次。

      Email regex:https://blog.csdn.net/Linli522362242/article/details/90139705
      [A-Za-z0-9\._+]+@[A-Za-z]+\.(com|org|edu|net)
  • 7. Return to Tableau and create a calculated field called Phone Number with the following code block. Note the regular expression nested in the calculated field:
    REGEXP_EXTRACT( [String of Data (String of Data)],
                    '(\([0-9]{3}\)-[0-9]{3}-[0-9]{4})'
                  )
         The outside '()' code acts as a placeholder for the expression pattern. The REGEXP_EXTRACT function used in this example is described in Tableau's help documentation as follows:

         Returns a substring of the given string that matches the capturing group within the regular expression pattern.

         Note that as of the time of writing, the Tableau documentation does not communicate how to ensure that the pattern input section of the function is properly delimited. For this example, be sure to include '()' around the pattern input section to avoid a null output.

         Nesting within a calculated field that is itself nested within a VizQL query can affect performance (if there are too many levels of nesting/aggregation).

    create a calculated field called Email with the following code block. Note the regular expression nested in the calculated field
    REGEXP_EXTRACT([String of Data (String of Data)],
                    '([A-Za-z0-9\._+]+@[A-Za-z]+\.(com|org|edu|net))'
                  )
  • Place Phone Number and Email on the Rows shelf, and observe the result:

     After reviewing this exercise, you may be curious about how to return just the email address. According to http://www.regular-expressions.info/email.html, the regular expression for email addresses adhering to the RFC 5322 standard is as follows:

(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

     Although I won't attempt a detailed explanation of this code, you can read all about it at http://www.regular-expressions.info/email.html , which is a great resource for learning more about regular expressions. Also, YouTube has several helpful regular expression tutorials.

     The final output for this exercise should probably be used to enhance existing source data. Data dumps such as this example do not belong in data warehouses; however, even important and necessary data can be hidden in such dumps, and Tableau can be effectively used to extract it.

Summary

     We began this chapter with a discussion of the Tableau data-handling engine. This illustrated the flexibility Tableau provides in working with data. The data-handling engine is important to understand in order to ensure that your data mining efforts are intelligently focused. Otherwise, your effort may be wasted on activities not relevant to Tableau. 

     Next, we discussed data mining and knowledge discovery process models, with an emphasis on CRISP-DM. The purpose of this discussion was to get an appropriate bird's-eye view of the scope of the entire data mining effort. Tableau authors (and certainly end users) can become so focused on the reporting produced in the deployment phase that they end up forgetting or short-changing the other phases, particularly data preparation.

     Our last focus in this chapter was on the phase that can be the most time-consuming and labor-intensive, namely data preparation. We considered using Tableau for surveying and also cleaning data. The data cleaning capabilities represented by the regular expression functions are particularly intriguing[ɪnˈtriːɡɪŋ]非常有趣的, and are worth further investigation. 

     Having completed our first data-centric discussion, we'll continue with Chapter 3, Tableau Prep Builder, looking at one of the newer features Tableau has brought to the market. Tableau Prep Builder is a dedicated data pre-processing interface that is able to reduce the amount of time you need for pre-processing even more. We'll take a look at cleaning, merging, filtering, joins, and the other functionality Tableau Prep Builder has to offer.

本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

mtb2_VizQL_Cleaning_Regx phone email_CRISP-DM_pdf table to text then to Excel 的相关文章

随机推荐

  • 7.Xaml Image控件

    1 运行图片 2 运行源码 a xaml源码
  • 2016视觉目标跟踪总结

    最近学习视觉目标跟踪算法 主要了解了几个主流的跟踪算法 kcf stc dsst 算法原理网上很多 这里就不再赘述 只对跟踪效果做了测试记录 Kcf 全名Kernelized Correlation Filters 其中hog特征用的fho
  • 嵌入式(条件变量和线程池)

    条件变量 应用场景 生产者消费者问题 是线程同步的一种手段 必要性 为了实现等待某个资源 让线程休眠 提高运行效率 int pthread cond wait pthread cond t restrict cond pthread mut
  • 开头为0的md5值总结

    s878926199a 0e545993274517709034328855841020 s155964671a 0e342768416822451524974117254469 s214587387a 0e8482404488305379
  • MATLAB曲线拟合灵敏度,用Matlab曲线拟合工具箱curve fitting曲线拟合,原来是这样的...

    在使用Matlab软件时 对于曲线拟合来说 有两种方式 其一是编写程序代码 其二是利用Curve fitting工具箱进行 本例通过一个多项式拟合的小试验 向您介绍利用curve fitting工具箱进行曲线拟合的一般步骤 工具 材料 Ma
  • 分块查找算法思路、示例和实现

    分块查找 索引表 22 44 74 数组 22 12 13 9 8 33 42 44 38 24 48 60 58 74 47 算法步骤 通过索引表线性查找确定在数组的哪一 块 通过数组里所在 块 的线性查找确定是否存在 在哪个位置 算法代
  • 2023西安交通大学软件工程915考研经验帖(初试+复试)

    目录 前言 一 初试准备 数学 英语 政治 专业课 总结 杂项 二 复试准备 1 笔试 数据库 操作系统 2 面试 总结 前言 本文仅记录我考研期间 2022 12初试 2023 3复试 的经验和感受 不具有普适性 请根据自身情况调整学习计
  • anaconda 删除环境_Anaconda:解决你装包的烦恼

    生物信息学的日常就是利用五花八门的工具和各种各样的数据打交道 很多时候需要在命令行安装软件或者包 我相信每一个生信人都碰到过安装软件或包时无法解决依赖的囧况 安装软件或者包 听起来是一件很简单的一件事 实际情况却不是如此 比如说编译时碰到系
  • android12适配机型,安卓12支持机型有哪些?安卓12系统为什么有的软件用不了?...

    安卓12系统终于发布了 虽说之前也体验了不少的测试版本了 这次正式版的发售还是很期待的 Android 12是Android历史上最大的设计变化 整体的界面也简洁了不少 不过也还存在不少的问题 比如指纹识别 人脸识别等 下面一起来看看安卓1
  • 网络通信TCP协议三次握手

    TCP是什么 TCP Transmission Control Protocol 传输控制协议 是一种面向连接 连接导向 的 可靠的 基于IP的传输层协议 TCP在IP报文的协议号是6 TCP是一个超级麻烦的协议 而它又是互联网的基础 也是
  • [Transformer]A Survey of Transformers-邱锡鹏

    复旦邱锡鹏组最新综述 A Survey of Transformers A Survey of Transformers Transformers已经在人工智能诸多领域 如NLP CV 声音处理等方面取得进展 也受到学术界和工业界的广泛关注
  • 第十四届蓝桥杯程序设计C++B组 (详细图解+保姆级注释)

    0 写在前面 本届CB组题目难度较往年整体提升了一些 考察知识点全面 题目质量很高 推荐备赛蓝桥杯或感兴趣的同学深入研究本套题 废话不多说 直接上干货 一 冶炼金属 签到题难度 考察数论分块知识or二分 有部分同学可能知道下取整的定义 但是
  • Mysql 时间戳转换为日期格式

    1 时间戳转日期 函数 FROM UNIXTIME select FROM UNIXTIME 1661997078 Y m d H i s 注意时间戳长度为 10 当时间戳长度大于10 要截取前十位 select substr 166199
  • python 再复习一下遍历目录下文件及子文件夹

    代表目前所在的目录 代表上一层目录 代表根目录 注意点的位置就是了 import os for image in os listdir os path join os getcwd 利润表 print image for root dirs
  • Vijava学习笔记之DataCenter(基础配置信息)

    vijava 代码 实体类 package com vmware pojo 数据中心 author zhb public class DataCenter extends Entity private String name 数据中心名称
  • ASP.NET Core 中间件详解及项目实战

    前言 在上篇文章主要介绍了DotNetCore项目状况 本篇文章是我们在开发自己的项目中实际使用的 比较贴合实际应用 算是对中间件的一个深入使用了 不是简单的Hello World 如果你觉得本篇文章对你有用的话 不妨点个 推荐 目录 中间
  • 使用ps命令查看进程的准确启动时间与启动后所消耗的时间

    使用ps命令查看nginx进程的准确启动时间与启动后到现在所消耗的时间 hadoop DSJ 4G 26 ps eo pid lstart etime cmd grep nginx 2204 Tue Nov 21 16 52 47 2017
  • Linux之iptables详解及tcpdump

    https www jianshu com p ed001ae61c58 原文有几处写的不明白的地方加以重整 可以参考这个人写的 感觉他明白的笔记透彻 https blog 51cto com 13677371 2094355 作者一共在内
  • K8S暴露服务的三种方式

    文章目录 暴露服务的三种方式 NodePort LoadBalane Ingress 内容参考 暴露服务的三种方式 NodePort 将服务的类型设置成NodePort 每个集群节点都会在节点上打 开 一 个端口 对于NodePort服务
  • mtb2_VizQL_Cleaning_Regx phone email_CRISP-DM_pdf table to text then to Excel

    Changing field attribution Let us look at the World Happiness Report We create the following worksheet by placing Start