Data Science 100 Knocks (Structured Data Processing) – SQL part3 (Q41 to Q60)

Articles in English
Commentary :

This code is also written in SQL language and is using the "WITH" clause to create two temporary tables named "sales_amount_by_date" and "sales_amount_lag_date". The code is then performing a self-join on the "sales_amount_lag_date" table and filtering the results based on certain conditions.

More specifically, the code is performing the following operations:

The first "WITH" clause creates a temporary table called "sales_amount_by_date" that calculates the total sales amount for each date in the "receipt" table.

The second "WITH" clause creates another temporary table called "sales_amount_lag_date" that uses the "LAG" function to calculate the sales date three days prior and the sales amount for each date in the "sales_amount_by_date" table.

The final "SELECT" statement selects specific columns from the "sales_amount_lag_date" table and performs a self-join with itself. The join condition filters the results to include only rows where the sales date of one row is less than the sales date of another row and where the sales date of the second row is less than or equal to the sales date of the first row three days prior.

The results are then sorted in ascending order by sales date and previous sales date and limited to the first 10 rows.

In summary, the code is performing a self-join on a table that includes the sales date and sales amount for each date in the "receipt" table. The join condition filters the results to include only rows where the sales date of one row is less than the sales date of another row and where the sales date of the second row is less than or equal to the sales date of the first row three days prior. The results display the sales date, sales amount, previous sales date, and previous sales amount for the first 10 rows.
 
 
 
 
 
 
Commentary :

This SQL code selects the customer_id, address, and a prefecture_cd code for each customer from the customer table.

The CASE expression in this code checks the first three characters of the address column and assigns a corresponding prefecture_cd value based on the following conditions:

If the first three characters of the address column are '埼玉県', the prefecture_cd value is set to '11'.

If the first three characters of the address column are '千葉県', the prefecture_cd value is set to '12'.

If the first three characters of the address column are '東京都', the prefecture_cd value is set to '13'.

If the first three characters of the address column are '神奈川', the prefecture_cd value is set to '14'.

This code is useful when you want to categorize customers by their prefecture (region) based on their address information stored in the customer table. The resulting prefecture_cd column can be used for further analysis or grouping.
 
Commentary :

This SQL code selects the customer_id, address, and a prefecture_cd code for each customer from the customer table.

The CASE expression in this code extracts the prefecture name from the address column using a regular expression pattern '^.*?[都道府県]' and assigns a corresponding prefecture_cd value based on the following conditions:

If the extracted prefecture name is '埼玉県', the prefecture_cd value is set to '11'.
If the extracted prefecture name is '千葉県', the prefecture_cd value is set to '12'.
If the extracted prefecture name is '東京都', the prefecture_cd value is set to '13'.
If the extracted prefecture name is '神奈川県', the prefecture_cd value is set to '14'.

This code is similar to the previous example, but it uses the SUBSTRING function to extract the prefecture name from the address column instead of the SUBSTR function. The regular expression pattern '^.*?[都道府県]' matches any string that starts with any characters (^.*?) and ends with one of the characters , , , or . This code is useful when the address information in the customer table is not standardized and includes variations in the format of prefecture names. The resulting prefecture_cd column can be used for further analysis or grouping.
 
 
 
 
 
 
 
Commentary :

This code is written in SQL language and is using a database management system that supports SQL.

The code is performing a data analysis task on a table called receipt. The table contains transactional data with columns such as customer_id and amount.

The code is using a common table expression (CTE) to create two temporary tables sales_amount and stats_amount.

The first CTE named sales_amount is calculating the total amount spent by each customer who has a customer_id that does not start with the letter 'Z'. It is grouping the transaction data by customer_id and calculating the sum of amount for each customer_id.

The second CTE named stats_amount is calculating the average and standard deviation of the total amount spent by each customer.

The final SELECT statement is selecting columns from the sales_amount table, and joining it with the stats_amount table using a cross join. It is also calculating the standardized amount spent by each customer by subtracting the average amount from the total amount and then dividing it by the standard deviation.

The LIMIT 10 at the end limits the output to the first 10 rows.

Overall, this code is performing a statistical analysis on the transactional data to find the standardized amount spent by each customer who doesn't have a customer_id starting with the letter 'Z'.
 
 
 

 

Commentary :

This code is also analyzing the receipt table, but instead of calculating the standardized amount, it is calculating the scaled amount spent by each customer.

The code is using two CTEs, sales_amount and stats_amount, which are similar to the previous code.

The first CTE calculates the total amount spent by each customer who has a customer_id that does not start with the letter 'Z', and the second CTE calculates the maximum and minimum total amount spent by any customer.

The final SELECT statement selects columns from the sales_amount table and joins it with the stats_amount table using a cross join.

It also calculates the scaled amount spent by each customer, which is calculated by subtracting the minimum amount spent from the total amount spent and then dividing it by the range of total amount spent, which is the difference between the maximum and minimum amount spent.

The 1.0 * before the expression is used to convert the result to a floating-point number, which is necessary if any of the values used in the expression are floating-point numbers.

The LIMIT 10 at the end limits the output to the first 10 rows.

Overall, this code is performing a data analysis task to find the scaled amount spent by each customer who doesn't have a customer_id starting with the letter 'Z'. The scaled amount is a value between 0 and 1 that represents the proportion of the total amount spent by each customer relative to the range of total amount spent by all customers.
 
Data Science 100 Knocks (Structured Data Processing) - SQL
This is an ipynb file originally created by The Data Scientist Society(データサイエンティスト協会スキル定義委員) and translated from Japanese to English by DeepL. The reason I updated this file is to spread this practice, which is useful for everyone who wants to practice SQL, from beginners to advanced engineers. Since this data is created for Japanese, you may face language problems when practicing. But do not worry, it will not affect much.
Data Science 100 Knocks (Structured Data Processing) - SQL part1 (Q1 to Q20)
This is an ipynb file originally created by The Data Scientist Society(データサイエンティスト協会スキル定義委員) and translated from Japanese to English by DeepL. The reason I updated this file is to spread this practice, which is useful for everyone who wants to practice SQL, from beginners to advanced engineers. Since this data is created for Japanese, you may face language problems when practicing. But do not worry, it will not affect much.
Data Science 100 Knocks (Structured Data Processing) - SQL part2 (Q21 to Q40)
This is an ipynb file originally created by The Data Scientist Society(データサイエンティスト協会スキル定義委員) and translated from Japanese to English by DeepL. The reason I updated this file is to spread this practice, which is useful for everyone who wants to practice SQL, from beginners to advanced engineers. Since this data is created for Japanese, you may face language problems when practicing. But do not worry, it will not affect much.
Data Science 100 Knocks (Structured Data Processing) - SQL part3 (Q41 to Q60)
This is an ipynb file originally created by The Data Scientist Society(データサイエンティスト協会スキル定義委員) and translated from Japanese to English by DeepL. The reason I updated this file is to spread this practice, which is useful for everyone who wants to practice SQL, from beginners to advanced engineers. Since this data is created for Japanese, you may face language problems when practicing. But do not worry, it will not affect much.
Data Science 100 Knocks (Structured Data Processing) - SQL part4 (Q61 to Q80)
This is an ipynb file originally created by The Data Scientist Society(データサイエンティスト協会スキル定義委員) and translated from Japanese to English by DeepL. The reason I updated this file is to spread this practice, which is useful for everyone who wants to practice SQL, from beginners to advanced engineers. Since this data is created for Japanese, you may face language problems when practicing. But do not worry, it will not affect much.
Data Science 100 Knocks (Structured Data Processing) - SQL part5 (Q81 to Q100)
This is an ipynb file originally created by The Data Scientist Society(データサイエンティスト協会スキル定義委員) and translated from Japanese to English by DeepL. The reason I updated this file is to spread this practice, which is useful for everyone who wants to practice SQL, from beginners to advanced engineers. Since this data is created for Japanese, you may face language problems when practicing. But do not worry, it will not affect much.
Data Science 100 Knocks (Structured Data Processing)
This is an ipynb file originally created by The Data Scientist Society(データサイエンティスト協会スキル定義委員) and translated from Japanese to English by DeepL. The reason I updated this file is to spread this practice, which is useful for everyone who wants to practice Python, SQL, R, from beginners to advanced engineers. Since this data is created for Japanese, you may face language problems when practicing. But do not worry, it will not affect much.

Comment