参照(Reference) : 「データサイエンティスト協会スキル定義委員」の「データサイエンス100本ノック(構造化データ加工編)」
The Data Scientist Society Github :
Data Science 100 Knocks (Structured Data Processing) URL :
-
Note: This is an ipynb file originally created by The Data Scientist Society(データサイエンティスト協会スキル定義委員) and translated from Japanese to English by DeepL.
-
The reason I updated this file is to spread this practice, which is useful for everyone who wants to practice SQL, from beginners to advanced engineers.
-
Since this data was created for Japanese, you may face language problems when practicing. But do not worry, it will not affect much.
You can get preprocess_knock files to practice python, SQL, R from my github account
Introduction
- The database is PostgreSQL13
- First, execute the following cell
- You can issue SQL by writing %%sql in the cell
- You cannot check the table structure with the describe command from jupyter, so if you want to check the table structure, use a SELECT with a limit, for example.
- You can use any familiar SQL client (connection information is as follows)
- IP address: localhost for Docker Desktop, 192.168.99.100 for Docker toolbox
- Port:5432
- Database name: dsdojo_db
- User name: padawan
- Password: padawan12345
- It is recommended to limit the number of outputs, as Jupyter may freeze if a large amount of outputs are performed (the number of outputs is also mentioned in the question).
- Data processing also requires the ability to control the amount of results to be displayed for confirmation and to make light work of them.
- If a large number of results are output, the file may become so heavy that it cannot be opened afterwards.
- In this case, the results of your work will be lost, but you can retrieve the file from GitHub.
- You can also delete large output ranges using a vim editor, for example.
- Names, addresses etc. are dummy data and do not exist.
%load_ext sql
import os
pgconfig = {
'host': 'db',
'port': os.environ['PG_PORT'],
'database': os.environ['PG_DATABASE'],
'user': os.environ['PG_USER'],
'password': os.environ['PG_PASSWORD'],
}
dsl = 'postgresql://{user}:{password}@{host}:{port}/{database}'.format(**pgconfig)
# Set up Magic commands to write SQL
%sql $dsl
'Connected: padawan@dsdojo_db'
How to use
- You can execute SQL from Jupyter against PostgreSQL by writing %%sql at the beginning of the cell and SQL in the second and subsequent lines.
%%sql
SELECT 'You can run it like this' AS sample;
* postgresql://padawan:***@db:5432/dsdojo_db 1 rows affected.
sample |
---|
You can run it like this |
100 data processing knocks
S-001: Display the first 10 items of all items from the receipt details data (df_receipt) and visually check what data is held.
%%sql
SELECT
*
FROM
receipt
LIMIT
10
;
* postgresql://padawan:***@db:5432/dsdojo_db 10 rows affected.
sales_ymd | sales_epoch | store_cd | receipt_no | receipt_sub_no | customer_id | product_cd | quantity | amount |
---|---|---|---|---|---|---|---|---|
20181103 | 1541203200 | S14006 | 112 | 1 | CS006214000001 | P070305012 | 1 | 158 |
20181118 | 1542499200 | S13008 | 1132 | 2 | CS008415000097 | P070701017 | 1 | 81 |
20170712 | 1499817600 | S14028 | 1102 | 1 | CS028414000014 | P060101005 | 1 | 170 |
20190205 | 1549324800 | S14042 | 1132 | 1 | ZZ000000000000 | P050301001 | 1 | 25 |
20180821 | 1534809600 | S14025 | 1102 | 2 | CS025415000050 | P060102007 | 1 | 90 |
20190605 | 1559692800 | S13003 | 1112 | 1 | CS003515000195 | P050102002 | 1 | 138 |
20181205 | 1543968000 | S14024 | 1102 | 2 | CS024514000042 | P080101005 | 1 | 30 |
20190922 | 1569110400 | S14040 | 1102 | 1 | CS040415000178 | P070501004 | 1 | 128 |
20170504 | 1493856000 | S13020 | 1112 | 2 | ZZ000000000000 | P071302010 | 1 | 770 |
20191010 | 1570665600 | S14027 | 1102 | 1 | CS027514000015 | P071101003 | 1 | 680 |
Commentary :
This code is written in SQL (Structured Query Language), a programming language used for managing and manipulating relational databases.
The code is querying a table named "receipt" and selecting all columns and rows from the table using the asterisk (*) wildcard character. The "LIMIT 10" clause at the end of the query limits the result set to the first 10 rows of the table.
In simpler terms, this code is retrieving the first 10 rows of data from the "receipt" table in a database. The exact meaning and structure of the data in the "receipt" table depends on the database schema and the information stored in it.
S-002: Display 10 items from the receipt details data (df_receipt), specifying the columns in the order of sales_date (sales_ymd), customer_id (customer_id), product_code (product_cd) and sales amount (amount).
%%sql
SELECT
sales_ymd,
customer_id,
product_cd,
amount
FROM
receipt
LIMIT
10
;
* postgresql://padawan:***@db:5432/dsdojo_db 10 rows affected.
sales_ymd | customer_id | product_cd | amount |
---|---|---|---|
20181103 | CS006214000001 | P070305012 | 158 |
20181118 | CS008415000097 | P070701017 | 81 |
20170712 | CS028414000014 | P060101005 | 170 |
20190205 | ZZ000000000000 | P050301001 | 25 |
20180821 | CS025415000050 | P060102007 | 90 |
20190605 | CS003515000195 | P050102002 | 138 |
20181205 | CS024514000042 | P080101005 | 30 |
20190922 | CS040415000178 | P070501004 | 128 |
20170504 | ZZ000000000000 | P071302010 | 770 |
20191010 | CS027514000015 | P071101003 | 680 |
Commentary :
This code is also written in SQL (Structured Query Language) and it is querying a table named "receipt".
The specific columns "sales_ymd", "customer_id", "product_cd", and "amount" are selected using the SELECT statement. These columns represent various pieces of information related to a purchase transaction, such as the date of the sale, the ID of the customer who made the purchase, the code of the product that was purchased, and the amount of the purchase.
The "LIMIT 10" clause at the end of the query limits the result set to the first 10 rows of the table, which means that the query will only return data for the first 10 transactions recorded in the "receipt" table.
In summary, this code is retrieving specific pieces of information about the first 10 transactions recorded in the "receipt" table in a database, using the columns "sales_ymd", "customer_id", "product_cd", and "amount". The exact meaning and structure of the data in the "receipt" table depends on the database schema and the information stored in it.
S-003: From the receipt details data (df_receipt), specify the columns in order of sales_date (sales_ymd), customer_id (customer_id), product_code (product_cd) and sales_amount (amount) and display 10 items. However, extract while renaming the item from sales_ymd to sales_date.
%%sql
SELECT
sales_ymd AS sales_date,
customer_id,
product_cd,
amount
FROM
receipt
LIMIT
10
;
* postgresql://padawan:***@db:5432/dsdojo_db 10 rows affected.
sales_date | customer_id | product_cd | amount |
---|---|---|---|
20181103 | CS006214000001 | P070305012 | 158 |
20181118 | CS008415000097 | P070701017 | 81 |
20170712 | CS028414000014 | P060101005 | 170 |
20190205 | ZZ000000000000 | P050301001 | 25 |
20180821 | CS025415000050 | P060102007 | 90 |
20190605 | CS003515000195 | P050102002 | 138 |
20181205 | CS024514000042 | P080101005 | 30 |
20190922 | CS040415000178 | P070501004 | 128 |
20170504 | ZZ000000000000 | P071302010 | 770 |
20191010 | CS027514000015 | P071101003 | 680 |
Commentary :
This code is also written in SQL (Structured Query Language) and it is querying a table named "receipt".
The SELECT statement is used to retrieve specific columns of data from the "receipt" table. However, in this case, the column names are being modified using the AS keyword. The first column is being renamed from "sales_ymd" to "sales_date". The other columns, "customer_id", "product_cd", and "amount", are being retained with their original names.
The "LIMIT 10" clause at the end of the query limits the result set to the first 10 rows of the table, which means that the query will only return data for the first 10 transactions recorded in the "receipt" table.
In summary, this code is retrieving specific columns of information about the first 10 transactions recorded in the "receipt" table in a database, using the modified column names "sales_date", "customer_id", "product_cd", and "amount". The exact meaning and structure of the data in the "receipt" table depends on the database schema and the information stored in it.
S-004: From receipt details data (df_receipt), specify columns in the order of sales_date (sales_ymd), customer ID (customer_id), product code (product_cd) and sales amount (amount) and extract data satisfying the following conditions.
- Customer ID (customer_id) is “CS018205000001”
%%sql
SELECT
sales_ymd,
customer_id,
product_cd,
amount
FROM
receipt
WHERE
customer_id = 'CS018205000001'
;
* postgresql://padawan:***@db:5432/dsdojo_db 12 rows affected.
sales_ymd | customer_id | product_cd | amount |
---|---|---|---|
20180911 | CS018205000001 | P071401012 | 2200 |
20180414 | CS018205000001 | P060104007 | 600 |
20170614 | CS018205000001 | P050206001 | 990 |
20170614 | CS018205000001 | P060702015 | 108 |
20190216 | CS018205000001 | P071005024 | 102 |
20180414 | CS018205000001 | P071101002 | 278 |
20190226 | CS018205000001 | P070902035 | 168 |
20190924 | CS018205000001 | P060805001 | 495 |
20190226 | CS018205000001 | P071401020 | 2200 |
20180911 | CS018205000001 | P071401005 | 1100 |
20190216 | CS018205000001 | P040101002 | 218 |
20190924 | CS018205000001 | P091503001 | 280 |
Commentary :
This code is also written in SQL (Structured Query Language) and it is querying a table named "receipt".
The SELECT statement is used to retrieve specific columns of data from the "receipt" table. The columns being selected are "sales_ymd", "customer_id", "product_cd", and "amount". These columns represent various pieces of information related to a purchase transaction, such as the date of the sale, the ID of the customer who made the purchase, the code of the product that was purchased, and the amount of the purchase.
The FROM statement specifies the "receipt" table from which the data is being retrieved.
The WHERE clause is used to filter the data based on a specific condition. In this case, the data is being filtered based on the condition that the "customer_id" column is equal to 'CS018205000001'. This means that the query will only retrieve data for transactions that were made by the customer with ID 'CS018205000001'.
In summary, this code is retrieving specific columns of information from the "receipt" table in a database, but only for transactions made by a specific customer with the ID 'CS018205000001'. The exact meaning and structure of the data in the "receipt" table depends on the database schema and the information stored in it.
S-005: Extract data from receipt details data (df_receipt), specifying columns in the order of sales_date (sales_ymd), customer ID (customer_id), product code (product_cd) and sales amount (amount) and satisfying all the following conditions.
- Customer ID (customer_id) is “CS018205000001”.
- Sales amount (amount) is 1,000 or more.
%%sql
SELECT
sales_ymd,
customer_id,
product_cd,
amount
FROM
receipt
WHERE
customer_id = 'CS018205000001'
AND
amount >= 1000
;
* postgresql://padawan:***@db:5432/dsdojo_db 3 rows affected.
sales_ymd | customer_id | product_cd | amount |
---|---|---|---|
20180911 | CS018205000001 | P071401012 | 2200 |
20190226 | CS018205000001 | P071401020 | 2200 |
20180911 | CS018205000001 | P071401005 | 1100 |
Commentary :
This code is also written in SQL (Structured Query Language) and it is querying a table named "receipt".
The SELECT statement is used to retrieve specific columns of data from the "receipt" table. The columns being selected are "sales_ymd", "customer_id", "product_cd", and "amount". These columns represent various pieces of information related to a purchase transaction, such as the date of the sale, the ID of the customer who made the purchase, the code of the product that was purchased, and the amount of the purchase.
The FROM statement specifies the "receipt" table from which the data is being retrieved.
The WHERE clause is used to filter the data based on specific conditions. In this case, the data is being filtered based on two conditions:
The "customer_id" column must be equal to 'CS018205000001'.
The "amount" column must be greater than or equal to 1000.
This means that the query will only retrieve data for transactions that were made by the customer with ID 'CS018205000001' and have an amount greater than or equal to 1000.
In summary, this code is retrieving specific columns of information from the "receipt" table in a database, but only for transactions made by a specific customer with the ID 'CS018205000001' and with an amount greater than or equal to 1000. The exact meaning and structure of the data in the "receipt" table depends on the database schema and the information stored in it.
S-006: Extract data from receipt details data (df_receipt), specifying the columns in the order of sales_date (sales_ymd), customer ID (customer_id), product code (product_cd), sales quantity (quantity) and sales amount (amount) and satisfying all the following conditions.
- Customer ID (customer_id) is “CS018205000001”.
- Sales amount (amount) is 1,000 or more or sales quantity (quantity) is 5 or more.
%%sql
SELECT
sales_ymd,
customer_id,
product_cd,
quantity,
amount
FROM
receipt
WHERE
customer_id = 'CS018205000001'
AND
(
amount >= 1000 OR quantity >= 5
)
;
* postgresql://padawan:***@db:5432/dsdojo_db 5 rows affected.
sales_ymd | customer_id | product_cd | quantity | amount |
---|---|---|---|---|
20180911 | CS018205000001 | P071401012 | 1 | 2200 |
20180414 | CS018205000001 | P060104007 | 6 | 600 |
20170614 | CS018205000001 | P050206001 | 5 | 990 |
20190226 | CS018205000001 | P071401020 | 1 | 2200 |
20180911 | CS018205000001 | P071401005 | 1 | 1100 |
Commentary :
This code is also written in SQL (Structured Query Language) and it is querying a table named "receipt".
The SELECT statement is used to retrieve specific columns of data from the "receipt" table. The columns being selected are "sales_ymd", "customer_id", "product_cd", "quantity", and "amount". These columns represent various pieces of information related to a purchase transaction, such as the date of the sale, the ID of the customer who made the purchase, the code of the product that was purchased, the quantity of the product purchased, and the amount of the purchase.
The FROM statement specifies the "receipt" table from which the data is being retrieved.
The WHERE clause is used to filter the data based on specific conditions. In this case, the data is being filtered based on two conditions:
The "customer_id" column must be equal to 'CS018205000001'.
The transaction must meet at least one of the following two conditions: a. The "amount" column must be greater than or equal to 1000. b. The "quantity" column must be greater than or equal to 5.
This means that the query will only retrieve data for transactions that were made by the customer with ID 'CS018205000001' and either have an amount greater than or equal to 1000 or a quantity greater than or equal to 5.
In summary, this code is retrieving specific columns of information from the "receipt" table in a database, but only for transactions made by a specific customer with the ID 'CS018205000001' and with either an amount greater than or equal to 1000 or a quantity greater than or equal to 5. The exact meaning and structure of the data in the "receipt" table depends on the database schema and the information stored in it.
S-007: From the receipt details data (df_receipt), specify the columns in order of sales date (sales_ymd), customer ID (customer_id), product code (product_cd) and sales amount (amount) and extract data that satisfy all the following conditions.
- Customer ID (customer_id) is “CS018205000001”.
- Sales amount (amount) is between 1,000 and 2,000.
%%sql
SELECT
sales_ymd,
customer_id,
product_cd,
amount
FROM
receipt
WHERE
customer_id = 'CS018205000001'
AND
amount BETWEEN 1000 AND 2000
;
* postgresql://padawan:***@db:5432/dsdojo_db 1 rows affected.
sales_ymd | customer_id | product_cd | amount |
---|---|---|---|
20180911 | CS018205000001 | P071401005 | 1100 |
Commentary :
This code is also written in SQL (Structured Query Language) and it is querying a table named "receipt".
The SELECT statement is used to retrieve specific columns of data from the "receipt" table. The columns being selected are "sales_ymd", "customer_id", "product_cd", and "amount". These columns represent various pieces of information related to a purchase transaction, such as the date of the sale, the ID of the customer who made the purchase, the code of the product that was purchased, and the amount of the purchase.
The FROM statement specifies the "receipt" table from which the data is being retrieved.
The WHERE clause is used to filter the data based on specific conditions. In this case, the data is being filtered based on two conditions:
The "customer_id" column must be equal to 'CS018205000001'.
The "amount" column must be between 1000 and 2000, inclusive.
This means that the query will only retrieve data for transactions that were made by the customer with ID 'CS018205000001' and have an amount between 1000 and 2000, inclusive.
The BETWEEN operator in the WHERE clause is used to filter results based on a range of values. In this case, it is used to retrieve transactions where the "amount" column falls between 1000 and 2000, inclusive.
In summary, this code is retrieving specific columns of information from the "receipt" table in a database, but only for transactions made by a specific customer with the ID 'CS018205000001' and with an amount between 1000 and 2000, inclusive. The exact meaning and structure of the data in the "receipt" table depends on the database schema and the information stored in it.
S-008: From the receipt details data (df_receipt), specify the columns in order of sales date (sales_ymd), customer ID (customer_id), product code (product_cd) and sales amount (amount), and extract data satisfying all the following conditions.
- Customer ID (customer_id) is “CS018205000001”.
- Product code (product_cd) is other than “P071401019”.
%%sql
SELECT
sales_ymd,
customer_id,
product_cd, amount
FROM
receipt
WHERE
customer_id = 'CS018205000001'
AND
product_cd != 'P071401019'
;
* postgresql://padawan:***@db:5432/dsdojo_db 12 rows affected.
sales_ymd | customer_id | product_cd | amount |
---|---|---|---|
20180911 | CS018205000001 | P071401012 | 2200 |
20180414 | CS018205000001 | P060104007 | 600 |
20170614 | CS018205000001 | P050206001 | 990 |
20170614 | CS018205000001 | P060702015 | 108 |
20190216 | CS018205000001 | P071005024 | 102 |
20180414 | CS018205000001 | P071101002 | 278 |
20190226 | CS018205000001 | P070902035 | 168 |
20190924 | CS018205000001 | P060805001 | 495 |
20190226 | CS018205000001 | P071401020 | 2200 |
20180911 | CS018205000001 | P071401005 | 1100 |
20190216 | CS018205000001 | P040101002 | 218 |
20190924 | CS018205000001 | P091503001 | 280 |
Commentary :
This code is written in SQL language and is querying a database.
The query is using the SELECT statement to select data from the "receipt" table.
The columns that are being selected are: "sales_ymd", "customer_id", "product_cd", and "amount".
The query is filtering the data by only selecting records where the "customer_id" is equal to 'CS018205000001' AND the "product_cd" is NOT equal to 'P071401019'.
This means that the query will return all records for customer_id 'CS018205000001' except for records where the product_cd is 'P071401019'.
This code is being executed within a Jupyter notebook using the SQL magic command "%%sql". This command allows SQL code to be executed directly within the notebook.
S-009: In the following process, rewrite OR to AND without changing the output result.
SELECT * FROM store WHERE NOT (prefecture_cd = '13' OR floor_area > 900)
%%sql
SELECT
*
FROM
store
WHERE
prefecture_cd != '13'
AND
floor_area <= 900
;
* postgresql://padawan:***@db:5432/dsdojo_db 3 rows affected.
store_cd | store_name | prefecture_cd | prefecture | address | address_kana | tel_no | longitude | latitude | floor_area |
---|---|---|---|---|---|---|---|---|---|
S14046 | 北山田店 | 14 | 神奈川県 | 神奈川県横浜市都筑区北山田一丁目 | カナガワケンヨコハマシツヅキクキタヤマタイッチョウメ | 045-123-4049 | 139.5916 | 35.56189 | 831.0 |
S14011 | 日吉本町店 | 14 | 神奈川県 | 神奈川県横浜市港北区日吉本町四丁目 | カナガワケンヨコハマシコウホククヒヨシホンチョウヨンチョウメ | 045-123-4033 | 139.6316 | 35.54655 | 890.0 |
S12013 | 習志野店 | 12 | 千葉県 | 千葉県習志野市芝園一丁目 | チバケンナラシノシシバゾノイッチョウメ | 047-123-4002 | 140.022 | 35.66122 | 808.0 |
Commentary :
This code is written in SQL language and is querying a database.
The query is using the SELECT statement to select all columns from the "store" table.
The query is filtering the data by only selecting records where the "prefecture_cd" is NOT equal to '13' AND the "floor_area" is less than or equal to 900.
This means that the query will return all records from the "store" table where the "prefecture_cd" is not equal to 13 and the "floor_area" is less than or equal to 900.
This code is being executed within a Jupyter notebook using the SQL magic command "%%sql". This command allows SQL code to be executed directly within the notebook.
S-010: From the shop data (df_store), extract all items whose shop code (store_cd) starts with “S14” and display 10 items.
%%sql
SELECT
*
FROM
store
WHERE
store_cd LIKE 'S14%'
LIMIT
10
;
* postgresql://padawan:***@db:5432/dsdojo_db 10 rows affected.
store_cd | store_name | prefecture_cd | prefecture | address | address_kana | tel_no | longitude | latitude | floor_area |
---|---|---|---|---|---|---|---|---|---|
S14010 | 菊名店 | 14 | 神奈川県 | 神奈川県横浜市港北区菊名一丁目 | カナガワケンヨコハマシコウホククキクナイッチョウメ | 045-123-4032 | 139.6326 | 35.50049 | 1732.0 |
S14033 | 阿久和店 | 14 | 神奈川県 | 神奈川県横浜市瀬谷区阿久和西一丁目 | カナガワケンヨコハマシセヤクアクワニシイッチョウメ | 045-123-4043 | 139.4961 | 35.45918 | 1495.0 |
S14036 | 相模原中央店 | 14 | 神奈川県 | 神奈川県相模原市中央二丁目 | カナガワケンサガミハラシチュウオウニチョウメ | 042-123-4045 | 139.3716 | 35.57327 | 1679.0 |
S14040 | 長津田店 | 14 | 神奈川県 | 神奈川県横浜市緑区長津田みなみ台五丁目 | カナガワケンヨコハマシミドリクナガツタミナミダイゴチョウメ | 045-123-4046 | 139.4994 | 35.52398 | 1548.0 |
S14050 | 阿久和西店 | 14 | 神奈川県 | 神奈川県横浜市瀬谷区阿久和西一丁目 | カナガワケンヨコハマシセヤクアクワニシイッチョウメ | 045-123-4053 | 139.4961 | 35.45918 | 1830.0 |
S14028 | 二ツ橋店 | 14 | 神奈川県 | 神奈川県横浜市瀬谷区二ツ橋町 | カナガワケンヨコハマシセヤクフタツバシチョウ | 045-123-4042 | 139.4963 | 35.46304 | 1574.0 |
S14012 | 本牧和田店 | 14 | 神奈川県 | 神奈川県横浜市中区本牧和田 | カナガワケンヨコハマシナカクホンモクワダ | 045-123-4034 | 139.6582 | 35.42156 | 1341.0 |
S14046 | 北山田店 | 14 | 神奈川県 | 神奈川県横浜市都筑区北山田一丁目 | カナガワケンヨコハマシツヅキクキタヤマタイッチョウメ | 045-123-4049 | 139.5916 | 35.56189 | 831.0 |
S14022 | 逗子店 | 14 | 神奈川県 | 神奈川県逗子市逗子一丁目 | カナガワケンズシシズシイッチョウメ | 046-123-4036 | 139.5789 | 35.29642 | 1838.0 |
S14011 | 日吉本町店 | 14 | 神奈川県 | 神奈川県横浜市港北区日吉本町四丁目 | カナガワケンヨコハマシコウホククヒヨシホンチョウヨンチョウメ | 045-123-4033 | 139.6316 | 35.54655 | 890.0 |
Commentary :
This code is written in SQL language and is querying a database.
The query is using the SELECT statement to select all columns from the "store" table.
The query is filtering the data by only selecting records where the "store_cd" column starts with the letters 'S14' using the LIKE operator with the wildcard character '%'.
The query is also using the LIMIT clause to limit the number of returned records to 10.
This means that the query will return up to 10 records from the "store" table where the "store_cd" starts with 'S14'.
This code is being executed within a Jupyter notebook using the SQL magic command "%%sql". This command allows SQL code to be executed directly within the notebook.
S-011: From the customer data (df_customer), extract all items whose customer ID (customer_id) ends with 1, and display 10 items.
%%sql
SELECT
*
FROM
customer
WHERE
customer_id LIKE '%1'
LIMIT
10
;
* postgresql://padawan:***@db:5432/dsdojo_db 10 rows affected.
customer_id | customer_name | gender_cd | gender | birth_day | age | postal_cd | address | application_store_cd | application_date | status_cd |
---|---|---|---|---|---|---|---|---|---|---|
CS037613000071 | 六角 雅彦 | 9 | 不明 | 1952-04-01 | 66 | 136-0076 | 東京都江東区南砂********** | S13037 | 20150414 | 0-00000000-0 |
CS028811000001 | 堀井 かおり | 1 | 女性 | 1933-03-27 | 86 | 245-0016 | 神奈川県横浜市泉区和泉町********** | S14028 | 20160115 | 0-00000000-0 |
CS040412000191 | 川井 郁恵 | 1 | 女性 | 1977-01-05 | 42 | 226-0021 | 神奈川県横浜市緑区北八朔町********** | S14040 | 20151101 | 1-20091025-4 |
CS028314000011 | 小菅 あおい | 1 | 女性 | 1983-11-26 | 35 | 246-0038 | 神奈川県横浜市瀬谷区宮沢********** | S14028 | 20151123 | 1-20080426-5 |
CS039212000051 | 藤島 恵梨香 | 1 | 女性 | 1997-02-03 | 22 | 166-0001 | 東京都杉並区阿佐谷北********** | S13039 | 20171121 | 1-20100215-4 |
CS015412000111 | 松居 奈月 | 1 | 女性 | 1972-10-04 | 46 | 136-0071 | 東京都江東区亀戸********** | S13015 | 20150629 | 0-00000000-0 |
CS004702000041 | 野島 洋 | 0 | 男性 | 1943-08-24 | 75 | 176-0022 | 東京都練馬区向山********** | S13004 | 20170218 | 0-00000000-0 |
CS041515000001 | 栗田 千夏 | 1 | 女性 | 1967-01-02 | 52 | 206-0001 | 東京都多摩市和田********** | S13041 | 20160422 | E-20100803-F |
CS029313000221 | 北条 ひかり | 1 | 女性 | 1987-06-19 | 31 | 279-0011 | 千葉県浦安市美浜********** | S12029 | 20180810 | 0-00000000-0 |
CS034312000071 | 望月 奈央 | 1 | 女性 | 1980-09-20 | 38 | 213-0026 | 神奈川県川崎市高津区久末********** | S14034 | 20160106 | 0-00000000-0 |
Commentary :
This code is written in SQL language and is querying a database.
The query is using the SELECT statement to select all columns from the "customer" table.
The query is filtering the data by only selecting records where the "customer_id" column ends with the number '1' using the LIKE operator with the wildcard character '%'.
The query is also using the LIMIT clause to limit the number of returned records to 10.
This means that the query will return up to 10 records from the "customer" table where the "customer_id" ends with the number '1'.
This code is being executed within a Jupyter notebook using the SQL magic command "%%sql". This command allows SQL code to be executed directly within the notebook.
S-012: From shop data (df_store), display all items with “横浜市(Yokohama-city)” in the address.
%%sql
SELECT
*
FROM
store
WHERE
address LIKE '%横浜市%'
;
* postgresql://padawan:***@db:5432/dsdojo_db 11 rows affected.
store_cd | store_name | prefecture_cd | prefecture | address | address_kana | tel_no | longitude | latitude | floor_area |
---|---|---|---|---|---|---|---|---|---|
S14010 | 菊名店 | 14 | 神奈川県 | 神奈川県横浜市港北区菊名一丁目 | カナガワケンヨコハマシコウホククキクナイッチョウメ | 045-123-4032 | 139.6326 | 35.50049 | 1732.0 |
S14033 | 阿久和店 | 14 | 神奈川県 | 神奈川県横浜市瀬谷区阿久和西一丁目 | カナガワケンヨコハマシセヤクアクワニシイッチョウメ | 045-123-4043 | 139.4961 | 35.45918 | 1495.0 |
S14040 | 長津田店 | 14 | 神奈川県 | 神奈川県横浜市緑区長津田みなみ台五丁目 | カナガワケンヨコハマシミドリクナガツタミナミダイゴチョウメ | 045-123-4046 | 139.4994 | 35.52398 | 1548.0 |
S14050 | 阿久和西店 | 14 | 神奈川県 | 神奈川県横浜市瀬谷区阿久和西一丁目 | カナガワケンヨコハマシセヤクアクワニシイッチョウメ | 045-123-4053 | 139.4961 | 35.45918 | 1830.0 |
S14028 | 二ツ橋店 | 14 | 神奈川県 | 神奈川県横浜市瀬谷区二ツ橋町 | カナガワケンヨコハマシセヤクフタツバシチョウ | 045-123-4042 | 139.4963 | 35.46304 | 1574.0 |
S14012 | 本牧和田店 | 14 | 神奈川県 | 神奈川県横浜市中区本牧和田 | カナガワケンヨコハマシナカクホンモクワダ | 045-123-4034 | 139.6582 | 35.42156 | 1341.0 |
S14046 | 北山田店 | 14 | 神奈川県 | 神奈川県横浜市都筑区北山田一丁目 | カナガワケンヨコハマシツヅキクキタヤマタイッチョウメ | 045-123-4049 | 139.5916 | 35.56189 | 831.0 |
S14011 | 日吉本町店 | 14 | 神奈川県 | 神奈川県横浜市港北区日吉本町四丁目 | カナガワケンヨコハマシコウホククヒヨシホンチョウヨンチョウメ | 045-123-4033 | 139.6316 | 35.54655 | 890.0 |
S14048 | 中川中央店 | 14 | 神奈川県 | 神奈川県横浜市都筑区中川中央二丁目 | カナガワケンヨコハマシツヅキクナカガワチュウオウニチョウメ | 045-123-4051 | 139.5758 | 35.54912 | 1657.0 |
S14042 | 新山下店 | 14 | 神奈川県 | 神奈川県横浜市中区新山下二丁目 | カナガワケンヨコハマシナカクシンヤマシタニチョウメ | 045-123-4047 | 139.6593 | 35.43894 | 1044.0 |
S14006 | 葛が谷店 | 14 | 神奈川県 | 神奈川県横浜市都筑区葛が谷 | カナガワケンヨコハマシツヅキククズガヤ | 045-123-4031 | 139.5633 | 35.53573 | 1886.0 |
Commentary :
This code is written in SQL language and is querying a database.
The query is using the SELECT statement to select all columns from the "store" table.
The query is filtering the data by only selecting records where the "address" column contains the string '横浜市' using the LIKE operator with the wildcard character '%'.
This means that the query will return all records from the "store" table where the "address" column contains the string '横浜市'.
This code is being executed within a Jupyter notebook using the SQL magic command "%%sql". This command allows SQL code to be executed directly within the notebook.
S-013: From the customer data (df_customer), extract all data whose status code (status_cd) begins with the letters A to F and display 10 items.
%%sql
SELECT
*
FROM
customer
WHERE
status_cd ~ '^[A-F]'
LIMIT
10
;
* postgresql://padawan:***@db:5432/dsdojo_db 10 rows affected.
customer_id | customer_name | gender_cd | gender | birth_day | age | postal_cd | address | application_store_cd | application_date | status_cd |
---|---|---|---|---|---|---|---|---|---|---|
CS031415000172 | 宇多田 貴美子 | 1 | 女性 | 1976-10-04 | 42 | 151-0053 | 東京都渋谷区代々木********** | S13031 | 20150529 | D-20100325-C |
CS015414000103 | 奥野 陽子 | 1 | 女性 | 1977-08-09 | 41 | 136-0073 | 東京都江東区北砂********** | S13015 | 20150722 | B-20100609-B |
CS011215000048 | 芦田 沙耶 | 1 | 女性 | 1992-02-01 | 27 | 223-0062 | 神奈川県横浜市港北区日吉本町********** | S14011 | 20150228 | C-20100421-9 |
CS029415000023 | 梅田 里穂 | 1 | 女性 | 1976-01-17 | 43 | 279-0043 | 千葉県浦安市富士見********** | S12029 | 20150610 | D-20100918-E |
CS035415000029 | 寺沢 真希 | 9 | 不明 | 1977-09-27 | 41 | 158-0096 | 東京都世田谷区玉川台********** | S13035 | 20141220 | F-20101029-F |
CS031415000106 | 宇野 由美子 | 1 | 女性 | 1970-02-26 | 49 | 151-0053 | 東京都渋谷区代々木********** | S13031 | 20150201 | F-20100511-E |
CS029215000025 | 石倉 美帆 | 1 | 女性 | 1993-09-28 | 25 | 279-0022 | 千葉県浦安市今川********** | S12029 | 20150708 | B-20100820-C |
CS033605000005 | 猪股 雄太 | 0 | 男性 | 1955-12-05 | 63 | 246-0031 | 神奈川県横浜市瀬谷区瀬谷********** | S14033 | 20150425 | F-20100917-E |
CS033415000229 | 板垣 菜々美 | 1 | 女性 | 1977-11-07 | 41 | 246-0021 | 神奈川県横浜市瀬谷区二ツ橋町********** | S14033 | 20150712 | F-20100326-E |
CS008415000145 | 黒谷 麻緒 | 1 | 女性 | 1977-06-27 | 41 | 157-0067 | 東京都世田谷区喜多見********** | S13008 | 20150829 | F-20100622-F |
Commentary :
This code is written in SQL language and is querying a database.
The query is using the SELECT statement to select all columns from the "customer" table.
The query is filtering the data by only selecting records where the "status_cd" column matches the regular expression pattern '^[A-F]' using the ~ operator.
The regular expression pattern '^[A-F]' matches any value in the "status_cd" column that starts with the letters A through F. The ^ character in the pattern indicates the start of the string, and the [A-F] specifies a character range.
The query is also using the LIMIT clause to limit the number of returned records to 10.
This means that the query will return up to 10 records from the "customer" table where the "status_cd" column starts with the letters A through F.
This code is being executed within a Jupyter notebook using the SQL magic command "%%sql". This command allows SQL code to be executed directly within the notebook.
S-014: From the customer data (df_customer), extract all data items whose status code (status_cd) ends with the numbers 1 to 9, and display 10 items.
%%sql
SELECT
*
FROM
customer
WHERE
status_cd ~ '[1-9]$'
LIMIT
10
;
* postgresql://padawan:***@db:5432/dsdojo_db 10 rows affected.
customer_id | customer_name | gender_cd | gender | birth_day | age | postal_cd | address | application_store_cd | application_date | status_cd |
---|---|---|---|---|---|---|---|---|---|---|
CS001215000145 | 田崎 美紀 | 1 | 女性 | 1995-03-29 | 24 | 144-0055 | 東京都大田区仲六郷********** | S13001 | 20170605 | 6-20090929-2 |
CS033513000180 | 安斎 遥 | 1 | 女性 | 1962-07-11 | 56 | 241-0823 | 神奈川県横浜市旭区善部町********** | S14033 | 20150728 | 6-20080506-5 |
CS011215000048 | 芦田 沙耶 | 1 | 女性 | 1992-02-01 | 27 | 223-0062 | 神奈川県横浜市港北区日吉本町********** | S14011 | 20150228 | C-20100421-9 |
CS040412000191 | 川井 郁恵 | 1 | 女性 | 1977-01-05 | 42 | 226-0021 | 神奈川県横浜市緑区北八朔町********** | S14040 | 20151101 | 1-20091025-4 |
CS009315000023 | 皆川 文世 | 1 | 女性 | 1980-04-15 | 38 | 154-0012 | 東京都世田谷区駒沢********** | S13009 | 20150319 | 5-20080322-1 |
CS015315000033 | 福士 璃奈子 | 1 | 女性 | 1983-03-17 | 36 | 135-0043 | 東京都江東区塩浜********** | S13015 | 20141024 | 4-20080219-3 |
CS023513000066 | 神戸 そら | 1 | 女性 | 1961-12-17 | 57 | 210-0005 | 神奈川県川崎市川崎区東田町********** | S14023 | 20150915 | 5-20100524-9 |
CS035513000134 | 市川 美帆 | 1 | 女性 | 1960-03-27 | 59 | 156-0053 | 東京都世田谷区桜********** | S13035 | 20150227 | 8-20100711-9 |
CS001515000263 | 高松 夏空 | 1 | 女性 | 1962-11-09 | 56 | 144-0051 | 東京都大田区西蒲田********** | S13001 | 20160812 | 1-20100804-1 |
CS040314000027 | 鶴田 きみまろ | 9 | 不明 | 1986-03-26 | 33 | 226-0027 | 神奈川県横浜市緑区長津田********** | S14040 | 20150122 | 2-20080426-4 |
Commentary :
This code is written in SQL language and is querying a database.
The query is using the SELECT statement to select all columns from the "customer" table.
The query is filtering the data by only selecting records where the "status_cd" column matches the regular expression pattern '[1-9]$' using the ~ operator.
The regular expression pattern '[1-9]$' matches any value in the "status_cd" column that ends with a digit from 1 through 9. The $ character in the pattern indicates the end of the string, and the [1-9] specifies a character range.
The query is also using the LIMIT clause to limit the number of returned records to 10.
This means that the query will return up to 10 records from the "customer" table where the "status_cd" column ends with a digit from 1 through 9.
This code is being executed within a Jupyter notebook using the SQL magic command "%%sql". This command allows SQL code to be executed directly within the notebook.
S-015: From the customer data (df_customer), extract all data items whose status code (status_cd) begins with the letters A to F and ends with the numbers 1 to 9, and display 10 items.
%%sql
SELECT
*
FROM
customer
WHERE
status_cd ~ '^[A-F].*[1-9]$'
LIMIT
10
;
* postgresql://padawan:***@db:5432/dsdojo_db 10 rows affected.
customer_id | customer_name | gender_cd | gender | birth_day | age | postal_cd | address | application_store_cd | application_date | status_cd |
---|---|---|---|---|---|---|---|---|---|---|
CS011215000048 | 芦田 沙耶 | 1 | 女性 | 1992-02-01 | 27 | 223-0062 | 神奈川県横浜市港北区日吉本町********** | S14011 | 20150228 | C-20100421-9 |
CS022513000105 | 島村 貴美子 | 1 | 女性 | 1962-03-12 | 57 | 249-0002 | 神奈川県逗子市山の根********** | S14022 | 20150320 | A-20091115-7 |
CS001515000096 | 水野 陽子 | 9 | 不明 | 1960-11-29 | 58 | 144-0053 | 東京都大田区蒲田本町********** | S13001 | 20150614 | A-20100724-7 |
CS013615000053 | 西脇 季衣 | 1 | 女性 | 1953-10-18 | 65 | 261-0026 | 千葉県千葉市美浜区幕張西********** | S12013 | 20150128 | B-20100329-6 |
CS020412000161 | 小宮 薫 | 1 | 女性 | 1974-05-21 | 44 | 174-0042 | 東京都板橋区東坂下********** | S13020 | 20150822 | B-20081021-3 |
CS001215000097 | 竹中 あさみ | 1 | 女性 | 1990-07-25 | 28 | 146-0095 | 東京都大田区多摩川********** | S13001 | 20170315 | A-20100211-2 |
CS035212000007 | 内村 恵梨香 | 1 | 女性 | 1990-12-04 | 28 | 152-0023 | 東京都目黒区八雲********** | S13035 | 20151013 | B-20101018-6 |
CS002515000386 | 野田 コウ | 1 | 女性 | 1963-05-30 | 55 | 185-0013 | 東京都国分寺市西恋ケ窪********** | S13002 | 20160410 | C-20100127-8 |
CS001615000372 | 稲垣 寿々花 | 1 | 女性 | 1956-10-29 | 62 | 144-0035 | 東京都大田区南蒲田********** | S13001 | 20170403 | A-20100104-1 |
CS032512000121 | 松井 知世 | 1 | 女性 | 1962-09-04 | 56 | 210-0011 | 神奈川県川崎市川崎区富士見********** | S13032 | 20150727 | A-20100103-5 |
Commentary :
This code is written in SQL language and is querying a database.
The query is using the SELECT statement to select all columns from the "customer" table.
The query is filtering the data by only selecting records where the "status_cd" column matches the regular expression pattern '^[A-F].*[1-9]$' using the ~ operator.
The regular expression pattern '^[A-F].*[1-9]$' matches any value in the "status_cd" column that starts with the letters A through F, followed by any number of characters, and ends with a digit from 1 through 9. The ^ character in the pattern indicates the start of the string, and the $ character indicates the end of the string.
The query is also using the LIMIT clause to limit the number of returned records to 10.
This means that the query will return up to 10 records from the "customer" table where the "status_cd" column starts with the letters A through F and ends with a digit from 1 through 9.
This code is being executed within a Jupyter notebook using the SQL magic command "%%sql". This command allows SQL code to be executed directly within the notebook.
S-016: From the shop data (df_store), display all items of data where the telephone number (tel_no) is 3 digits – 3 digits – 4 digits.
%%sql
SELECT
*
FROM
store
WHERE
tel_no ~ '^[0-9]{3}-[0-9]{3}-[0-9]{4}$'
;
* postgresql://padawan:***@db:5432/dsdojo_db 34 rows affected.
store_cd | store_name | prefecture_cd | prefecture | address | address_kana | tel_no | longitude | latitude | floor_area |
---|---|---|---|---|---|---|---|---|---|
S12014 | 千草台店 | 12 | 千葉県 | 千葉県千葉市稲毛区千草台一丁目 | チバケンチバシイナゲクチグサダイイッチョウメ | 043-123-4003 | 140.118 | 35.63559 | 1698.0 |
S13002 | 国分寺店 | 13 | 東京都 | 東京都国分寺市本多二丁目 | トウキョウトコクブンジシホンダニチョウメ | 042-123-4008 | 139.4802 | 35.70566 | 1735.0 |
S14010 | 菊名店 | 14 | 神奈川県 | 神奈川県横浜市港北区菊名一丁目 | カナガワケンヨコハマシコウホククキクナイッチョウメ | 045-123-4032 | 139.6326 | 35.50049 | 1732.0 |
S14033 | 阿久和店 | 14 | 神奈川県 | 神奈川県横浜市瀬谷区阿久和西一丁目 | カナガワケンヨコハマシセヤクアクワニシイッチョウメ | 045-123-4043 | 139.4961 | 35.45918 | 1495.0 |
S14036 | 相模原中央店 | 14 | 神奈川県 | 神奈川県相模原市中央二丁目 | カナガワケンサガミハラシチュウオウニチョウメ | 042-123-4045 | 139.3716 | 35.57327 | 1679.0 |
S14040 | 長津田店 | 14 | 神奈川県 | 神奈川県横浜市緑区長津田みなみ台五丁目 | カナガワケンヨコハマシミドリクナガツタミナミダイゴチョウメ | 045-123-4046 | 139.4994 | 35.52398 | 1548.0 |
S14050 | 阿久和西店 | 14 | 神奈川県 | 神奈川県横浜市瀬谷区阿久和西一丁目 | カナガワケンヨコハマシセヤクアクワニシイッチョウメ | 045-123-4053 | 139.4961 | 35.45918 | 1830.0 |
S13052 | 森野店 | 13 | 東京都 | 東京都町田市森野三丁目 | トウキョウトマチダシモリノサンチョウメ | 042-123-4030 | 139.4383 | 35.55293 | 1087.0 |
S14028 | 二ツ橋店 | 14 | 神奈川県 | 神奈川県横浜市瀬谷区二ツ橋町 | カナガワケンヨコハマシセヤクフタツバシチョウ | 045-123-4042 | 139.4963 | 35.46304 | 1574.0 |
S14012 | 本牧和田店 | 14 | 神奈川県 | 神奈川県横浜市中区本牧和田 | カナガワケンヨコハマシナカクホンモクワダ | 045-123-4034 | 139.6582 | 35.42156 | 1341.0 |
S14046 | 北山田店 | 14 | 神奈川県 | 神奈川県横浜市都筑区北山田一丁目 | カナガワケンヨコハマシツヅキクキタヤマタイッチョウメ | 045-123-4049 | 139.5916 | 35.56189 | 831.0 |
S14022 | 逗子店 | 14 | 神奈川県 | 神奈川県逗子市逗子一丁目 | カナガワケンズシシズシイッチョウメ | 046-123-4036 | 139.5789 | 35.29642 | 1838.0 |
S14011 | 日吉本町店 | 14 | 神奈川県 | 神奈川県横浜市港北区日吉本町四丁目 | カナガワケンヨコハマシコウホククヒヨシホンチョウヨンチョウメ | 045-123-4033 | 139.6316 | 35.54655 | 890.0 |
S13016 | 小金井店 | 13 | 東京都 | 東京都小金井市本町一丁目 | トウキョウトコガネイシホンチョウイッチョウメ | 042-123-4015 | 139.5094 | 35.70018 | 1399.0 |
S14034 | 川崎野川店 | 14 | 神奈川県 | 神奈川県川崎市宮前区野川 | カナガワケンカワサキシミヤマエクノガワ | 044-123-4044 | 139.5998 | 35.57693 | 1318.0 |
S14048 | 中川中央店 | 14 | 神奈川県 | 神奈川県横浜市都筑区中川中央二丁目 | カナガワケンヨコハマシツヅキクナカガワチュウオウニチョウメ | 045-123-4051 | 139.5758 | 35.54912 | 1657.0 |
S12007 | 佐倉店 | 12 | 千葉県 | 千葉県佐倉市上志津 | チバケンサクラシカミシヅ | 043-123-4001 | 140.1452 | 35.71872 | 1895.0 |
S14026 | 辻堂西海岸店 | 14 | 神奈川県 | 神奈川県藤沢市辻堂西海岸二丁目 | カナガワケンフジサワシツジドウニシカイガンニチョウメ | 046-123-4040 | 139.4466 | 35.32464 | 1732.0 |
S13041 | 八王子店 | 13 | 東京都 | 東京都八王子市大塚 | トウキョウトハチオウジシオオツカ | 042-123-4026 | 139.4235 | 35.63787 | 810.0 |
S14049 | 川崎大師店 | 14 | 神奈川県 | 神奈川県川崎市川崎区中瀬三丁目 | カナガワケンカワサキシカワサキクナカゼサンチョウメ | 044-123-4052 | 139.7327 | 35.53759 | 962.0 |
S14023 | 川崎店 | 14 | 神奈川県 | 神奈川県川崎市川崎区本町二丁目 | カナガワケンカワサキシカワサキクホンチョウニチョウメ | 044-123-4037 | 139.7028 | 35.53599 | 1804.0 |
S13018 | 清瀬店 | 13 | 東京都 | 東京都清瀬市松山一丁目 | トウキョウトキヨセシマツヤマイッチョウメ | 042-123-4017 | 139.5178 | 35.76885 | 1220.0 |
S14027 | 南藤沢店 | 14 | 神奈川県 | 神奈川県藤沢市南藤沢 | カナガワケンフジサワシミナミフジサワ | 046-123-4041 | 139.4896 | 35.33762 | 1521.0 |
S14021 | 伊勢原店 | 14 | 神奈川県 | 神奈川県伊勢原市伊勢原四丁目 | カナガワケンイセハラシイセハラヨンチョウメ | 046-123-4035 | 139.3129 | 35.40169 | 962.0 |
S14047 | 相模原店 | 14 | 神奈川県 | 神奈川県相模原市千代田六丁目 | カナガワケンサガミハラシチヨダロクチョウメ | 042-123-4050 | 139.3748 | 35.55959 | 1047.0 |
S12013 | 習志野店 | 12 | 千葉県 | 千葉県習志野市芝園一丁目 | チバケンナラシノシシバゾノイッチョウメ | 047-123-4002 | 140.022 | 35.66122 | 808.0 |
S14042 | 新山下店 | 14 | 神奈川県 | 神奈川県横浜市中区新山下二丁目 | カナガワケンヨコハマシナカクシンヤマシタニチョウメ | 045-123-4047 | 139.6593 | 35.43894 | 1044.0 |
S12030 | 八幡店 | 12 | 千葉県 | 千葉県市川市八幡三丁目 | チバケンイチカワシヤワタサンチョウメ | 047-123-4005 | 139.924 | 35.72318 | 1162.0 |
S14025 | 大和店 | 14 | 神奈川県 | 神奈川県大和市下和田 | カナガワケンヤマトシシモワダ | 046-123-4039 | 139.468 | 35.43414 | 1011.0 |
S14045 | 厚木店 | 14 | 神奈川県 | 神奈川県厚木市中町二丁目 | カナガワケンアツギシナカチョウニチョウメ | 046-123-4048 | 139.3651 | 35.44182 | 980.0 |
S12029 | 東野店 | 12 | 千葉県 | 千葉県浦安市東野一丁目 | チバケンウラヤスシヒガシノイッチョウメ | 047-123-4004 | 139.8968 | 35.65086 | 1101.0 |
S12053 | 高洲店 | 12 | 千葉県 | 千葉県浦安市高洲五丁目 | チバケンウラヤスシタカスゴチョウメ | 047-123-4006 | 139.9176 | 35.63755 | 1555.0 |
S14024 | 三田店 | 14 | 神奈川県 | 神奈川県川崎市多摩区三田四丁目 | カナガワケンカワサキシタマクミタヨンチョウメ | 044-123-4038 | 139.5424 | 35.6077 | 972.0 |
S14006 | 葛が谷店 | 14 | 神奈川県 | 神奈川県横浜市都筑区葛が谷 | カナガワケンヨコハマシツヅキククズガヤ | 045-123-4031 | 139.5633 | 35.53573 | 1886.0 |
Commentary :
This code is written in SQL language and is querying a database.
The query is using the SELECT statement to select all columns from the "store" table.
The query is filtering the data by only selecting records where the "tel_no" column matches the regular expression pattern '^[0-9]{3}-[0-9]{3}-[0-9]{4}$' using the ~ operator.
The regular expression pattern '^[0-9]{3}-[0-9]{3}-[0-9]{4}$' matches any value in the "tel_no" column that follows the format of a valid phone number with the format ###-###-####, where each # represents a digit. The ^ character in the pattern indicates the start of the string, and the $ character indicates the end of the string. The {3} and {4} are quantifiers that specify the exact number of digits in each section of the phone number.
This means that the query will return all records from the "store" table where the "tel_no" column matches the format of a valid phone number.
This code is being executed within a Jupyter notebook using the SQL magic command "%%sql". This command allows SQL code to be executed directly within the notebook.
S-017: Sort customer data (df_customer) by birth_day(birth_day) in order of age and display all 10 items from the top.
%%sql
SELECT
*
FROM
customer
ORDER BY
birth_day
LIMIT 10
;
* postgresql://padawan:***@db:5432/dsdojo_db 10 rows affected.
customer_id | customer_name | gender_cd | gender | birth_day | age | postal_cd | address | application_store_cd | application_date | status_cd |
---|---|---|---|---|---|---|---|---|---|---|
CS003813000014 | 村山 菜々美 | 1 | 女性 | 1928-11-26 | 90 | 182-0007 | 東京都調布市菊野台********** | S13003 | 20160214 | 0-00000000-0 |
CS026813000004 | 吉村 朝陽 | 1 | 女性 | 1928-12-14 | 90 | 251-0043 | 神奈川県藤沢市辻堂元町********** | S14026 | 20150723 | 0-00000000-0 |
CS018811000003 | 熊沢 美里 | 1 | 女性 | 1929-01-07 | 90 | 204-0004 | 東京都清瀬市野塩********** | S13018 | 20150403 | 0-00000000-0 |
CS027803000004 | 内村 拓郎 | 0 | 男性 | 1929-01-12 | 90 | 251-0031 | 神奈川県藤沢市鵠沼藤が谷********** | S14027 | 20151227 | 0-00000000-0 |
CS013801000003 | 天野 拓郎 | 0 | 男性 | 1929-01-15 | 90 | 274-0824 | 千葉県船橋市前原東********** | S12013 | 20160120 | 0-00000000-0 |
CS001814000022 | 鶴田 里穂 | 1 | 女性 | 1929-01-28 | 90 | 144-0045 | 東京都大田区南六郷********** | S13001 | 20161012 | A-20090415-7 |
CS016815000002 | 山元 美紀 | 1 | 女性 | 1929-02-22 | 90 | 184-0005 | 東京都小金井市桜町********** | S13016 | 20150629 | C-20090923-C |
CS009815000003 | 中田 里穂 | 1 | 女性 | 1929-04-08 | 89 | 154-0014 | 東京都世田谷区新町********** | S13009 | 20150421 | D-20091021-E |
CS012813000013 | 宇野 南朋 | 1 | 女性 | 1929-04-09 | 89 | 231-0806 | 神奈川県横浜市中区本牧町********** | S14012 | 20150712 | 0-00000000-0 |
CS005813000015 | 金谷 恵梨香 | 1 | 女性 | 1929-04-09 | 89 | 165-0032 | 東京都中野区鷺宮********** | S13005 | 20150506 | 0-00000000-0 |
Commentary :
This code is written in SQL language and is querying a database.
The query is using the SELECT statement to select all columns from the "customer" table.
The query is sorting the data in ascending order based on the "birth_day" column using the ORDER BY clause.
The LIMIT clause is used to limit the number of returned records to 10.
This means that the query will return the first 10 records from the "customer" table when the data is sorted in ascending order based on the "birth_day" column.
This code is being executed within a Jupyter notebook using the SQL magic command "%%sql". This command allows SQL code to be executed directly within the notebook.
S-018: Sort customer data (df_customer) by birth_day(birth_day) in descending order, displaying all 10 items from the top.
%%sql
SELECT
*
FROM
customer
ORDER BY
birth_day DESC
LIMIT
10
;
* postgresql://padawan:***@db:5432/dsdojo_db 10 rows affected.
customer_id | customer_name | gender_cd | gender | birth_day | age | postal_cd | address | application_store_cd | application_date | status_cd |
---|---|---|---|---|---|---|---|---|---|---|
CS035114000004 | 大村 美里 | 1 | 女性 | 2007-11-25 | 11 | 156-0053 | 東京都世田谷区桜********** | S13035 | 20150619 | 6-20091205-6 |
CS022103000002 | 福山 はじめ | 9 | 不明 | 2007-10-02 | 11 | 249-0006 | 神奈川県逗子市逗子********** | S14022 | 20160909 | 0-00000000-0 |
CS002113000009 | 柴田 真悠子 | 1 | 女性 | 2007-09-17 | 11 | 184-0014 | 東京都小金井市貫井南町********** | S13002 | 20160304 | 0-00000000-0 |
CS004115000014 | 松井 京子 | 1 | 女性 | 2007-08-09 | 11 | 165-0031 | 東京都中野区上鷺宮********** | S13004 | 20161120 | 1-20081231-1 |
CS002114000010 | 山内 遥 | 1 | 女性 | 2007-06-03 | 11 | 184-0015 | 東京都小金井市貫井北町********** | S13002 | 20160920 | 6-20100510-1 |
CS025115000002 | 小柳 夏希 | 1 | 女性 | 2007-04-18 | 11 | 245-0018 | 神奈川県横浜市泉区上飯田町********** | S14025 | 20160116 | D-20100913-D |
CS002113000025 | 広末 まなみ | 1 | 女性 | 2007-03-30 | 12 | 184-0015 | 東京都小金井市貫井北町********** | S13002 | 20171030 | 0-00000000-0 |
CS033112000003 | 長野 美紀 | 1 | 女性 | 2007-03-22 | 12 | 245-0051 | 神奈川県横浜市戸塚区名瀬町********** | S14033 | 20150606 | 0-00000000-0 |
CS007115000006 | 福岡 瞬 | 1 | 女性 | 2007-03-10 | 12 | 285-0845 | 千葉県佐倉市西志津********** | S12007 | 20151118 | F-20101016-F |
CS014113000008 | 矢口 莉緒 | 1 | 女性 | 2007-03-05 | 12 | 260-0041 | 千葉県千葉市中央区東千葉********** | S12014 | 20150622 | 3-20091108-6 |
Commentary :
This code is written in SQL language and is querying a database.
The query is using the SELECT statement to select all columns from the "customer" table.
The query is sorting the data in descending order based on the "birth_day" column using the ORDER BY clause.
The LIMIT clause is used to limit the number of returned records to 10.
This means that the query will return the first 10 records from the "customer" table when the data is sorted in descending order based on the "birth_day" column.
This code is being executed within a Jupyter notebook using the SQL magic command "%%sql". This command allows SQL code to be executed directly within the notebook.
S-019: Rank the receipt details data (df_receipt) in descending order of sales amount per item (amount) and display 10 items from the top. The items shall display the customer ID (customer_id), sales amount (amount) and the rank assigned. If the sales amount (amount) is equal, the same rank shall be assigned.
%%sql
SELECT
customer_id,
amount,
RANK() OVER(ORDER BY amount DESC) AS ranking
FROM
receipt
LIMIT
10
;
* postgresql://padawan:***@db:5432/dsdojo_db 10 rows affected.
customer_id | amount | ranking |
---|---|---|
CS011415000006 | 10925 | 1 |
ZZ000000000000 | 6800 | 2 |
CS028605000002 | 5780 | 3 |
ZZ000000000000 | 5480 | 4 |
ZZ000000000000 | 5480 | 4 |
CS015515000034 | 5480 | 4 |
ZZ000000000000 | 5440 | 7 |
CS021515000089 | 5440 | 7 |
CS020414000081 | 5280 | 9 |
ZZ000000000000 | 5280 | 9 |
Commentary :
This code is written in SQL language and is querying a database.
The query is selecting the "customer_id" and "amount" columns from the "receipt" table.
The query is also using a window function called RANK(), which is being used to calculate the ranking of each record based on the "amount" column. The RANK() function calculates the rank of each record by assigning the same rank to equal values and leaving gaps between the ranks for non-equal values. The window function is defined using the OVER() clause, which specifies the order in which to apply the ranking and any partitioning or filtering criteria.
The query is using the ORDER BY clause to order the data in descending order based on the "amount" column.
The query is using the AS keyword to alias the result of the RANK() function as "ranking".
The LIMIT clause is used to limit the number of returned records to 10.
This means that the query will return the first 10 records from the "receipt" table, with each record including the "customer_id", "amount", and "ranking" columns. The "ranking" column will contain the ranking of each record based on the "amount" column, with the highest amount having a ranking of 1.
This code is being executed within a Jupyter notebook using the SQL magic command "%%sql". This command allows SQL code to be executed directly within the notebook.
S-020: For receipt details data (df_receipt), assign ranks in descending order of sales amount (amount) per item and display the first 10 items. Items shall display the customer ID (customer_id), sales amount (amount) and the rank assigned. Even if the sales amount (amount) is equal, a different rank should be assigned.
%%sql
SELECT
customer_id,
amount,
ROW_NUMBER() OVER(ORDER BY amount DESC) AS ranking
FROM
receipt
LIMIT
10
;
* postgresql://padawan:***@db:5432/dsdojo_db 10 rows affected.
customer_id | amount | ranking |
---|---|---|
CS011415000006 | 10925 | 1 |
ZZ000000000000 | 6800 | 2 |
CS028605000002 | 5780 | 3 |
ZZ000000000000 | 5480 | 4 |
CS015515000034 | 5480 | 5 |
ZZ000000000000 | 5480 | 6 |
ZZ000000000000 | 5440 | 7 |
CS021515000089 | 5440 | 8 |
CS001412000160 | 5280 | 9 |
CS020515000102 | 5280 | 10 |
Commentary :
This code is a SQL query being executed in a Jupyter notebook using the magic command "%%sql".
The query selects three columns from the "receipt" table: "customer_id", "amount", and a new column "ranking". The "customer_id" and "amount" columns are straightforward and simply retrieve the values for these two fields. The "ranking" column is a derived column created using a window function called ROW_NUMBER().
The ROW_NUMBER() function generates a unique sequential number for each row in the result set based on the ORDER BY clause provided. In this case, the function is being used to assign a rank to each row based on the value in the "amount" column. The ORDER BY clause is ordering the rows in descending order based on the "amount" column, so the highest amount will be ranked 1, the second-highest amount will be ranked 2, and so on.
The AS keyword is used to give the new column an alias of "ranking".
Finally, the LIMIT clause is used to limit the number of rows returned by the query to 10.
Therefore, the result of this query will be the top 10 rows from the "receipt" table, with columns for "customer_id", "amount", and "ranking". The ranking column will contain a sequential number assigned to each row based on the order of the "amount" column, with the highest amount having a ranking of 1.
Comment