TDM 40100: Project 4 — 2023
Motivation: The ability to use SQL to query data in a relational database is an extremely useful skill. What is even more useful is the ability to build a sqlite3
database, design a schema, insert data, create indexes, etc. This series of projects is focused around SQL, sqlite3
, with the opportunity to use other skills you’ve built throughout the previous years.
Context: In TDM 20100 you had the opportunity to learn some basics of SQL, and likely worked (at least partially) with sqlite3
— a powerful database engine. In this project (and following projects), we will branch into SQL and sqlite3
-specific topics and techniques that you haven’t yet had exposure to in The Data Mine.
Scope: sqlite3
, lmod, SQL
Dataset(s)
The following questions will use the following dataset(s):
-
/anvil/projects/tdm/data/goodreads/datadownloadsAugust2023/goodreads_book_authors.json
-
/anvil/projects/tdm/data/goodreads/datadownloadsAugust2023/goodreads_book_series.json
-
/anvil/projects/tdm/data/goodreads/datadownloadsAugust2023/goodreads_books.json
-
/anvil/projects/tdm/data/goodreads/datadownloadsAugust2023/goodreads_reviews_dedup.json
Questions
Question 1 (1 pt)
-
Create a junction table between
books
andseries
calledbooks_series
. -
Write a list of each column in the
books_series
table, their data type, and a brief description of what they contain. -
Write an SQL query to find every book in a series with
series_id
12345.
Assume that a series can have many books and a book can be a part of many series. Following a similar process as in the previous project, create another junction table called books_series
.
In a markdown cell, list the columns in the books_series
table, their data type, and a brief description of what they represent.
Finally, write an SQL query to find every book in a series with series id
12345.
-
CREATE TABLE
statement forbooks_series
. -
List of columns in
books_series
with descriptions. -
SQL query to find every book in a series with
series id
12345.
Question 2 (2 pts)
-
Create a new database called
my_goodreads.db
that is the same as ourmy.db
database (after normalization), but with primary and foreign key constraints set.
As you may have noticed, we have determined the primary and foreign keys for our tables, but haven’t actually set them in the database yet. Unfortunately, sqlite3 has no way of setting these constraints after creating the able, so we will be rewriting our CREATE TABLE
statements inside a new database labeled my_goodreads.db
.
Going through your Jupyter notebooks from the previous project (or the beginning of this project) recreate the existing database we have made, but this time with primary and foreign key constraints set.
While doing this, remember to stick to the naming conventions and normalization we established in this project, and don’t forget to create your junction tables. It is particularly important to set primary and foreign key constraints in junction tables, as this gives us an added layer of data integrity when we are trying to modify data in our database.
If you are struggling or need a sanity check on your work, below is the number of constraints you will need to set for each table.
You can check the constraints on a table by running |
-
Modified
CREATE TABLE
statements for our new database, with proper constraints set.
Question 3 (2 pts)
-
Create a new junction table to handle
similar_books
. -
Create a new table and accompanying junction table to handle
popular_shelves
. -
Write 2+ sentences detailing an alternative approach to handling
popular_shelves
, and any benefits your approach has over the one provided. -
Write two queries that use all of the new tables created during this question.
The remaining two fields that need to be dealt with are similar_books
and popular_shelves
. These fields are a bit more complicated than the others, so you will need to think a bit more before implementing a solution. Remember to keep normalization in mind.
Firstly, for similar_books, I would recommend another junction table. As there is nothing to stop you from linking a table to itself, you can simply create a junction table that links books to each other. Write (and run) a CREATE TABLE
statement that creates a junction table between books
and books
called similar_books
. Paste your CREATE TABLE
statement into your Jupyter notebook for reference.
Next, for popular_shelves, let’s create two more tables. First, create a shelves
table with id
and name
columns. Choose data types appropriately. Next, create a junction table between shelves
and books
called books_shelves
with the appropriate column names and types. Be sure to normalize everything, as we have been doing throughout this project.
Now that we’ve created some rather straightforward approaches to handling these two data fields, do some more thinking on your own. Write at least two sentences in a markdown cell detaining a different approach to handling popular_shelves, and any benefits your approach has over this one. (Hint: composite keys!)
Finally, write two queries that use all of the new tables we created (one for similar_books
and one for popular_shelves
).
Ensure that all of your work is visible in your Jupyter notebook prior to submitting!
-
3
CREATE TABLE
statements for the new tables and junction tables. -
2+ sentences in a markdown cell detailing your alternative approach and its benefits over the provided approach.
-
2 queries that use all of the new tables created during this question.
Submitting your Work
Good work, you’ve made it to the end of your fourth project for TDM 401 and finished preparing your database for data entry! In the next project, we will begin populating our database with data! As always, ensure that all your work is visible as you expect in your submission to ensure you get the full points you deserve.
You must double check your You will not receive full credit if your |
-
firstname-lastname-project04.ipynb
.
Please make sure to double check that your submission is complete, and contains all of your code and output before submitting. If you are on a spotty internet connection, it is recommended to download your submission after submitting it to make sure what you think you submitted, was what you actually submitted. In addition, please review our submission guidelines before submitting your project. |