So, here's the situation: I have a CSV file that looks like this:
show_id title cast 1 Batman Robert Pattinson, Collin Farrel, Zoë Kravitz 2 Twilight Robert Pattinson, Kristen Stewart 3 Ava Jessica Chastain, Collin Farrel
What I need to do is open this CSV file in a Python function and do some processing on spaces etc.
Then I need to upload it into a SQL database (anything I want, but I chose MySQL), which is no problem.
question
My main problem is that I then need (as my challenge requires it) to create a query that counts the number of times an actor appears in all movies in a list. So, in this case, the query should look like this:
演员 电影数量 Robert Pattinson 2 Collin Farrel 2 Zoë Kravitz 1 Kristen Stewart 1 Jessica Chastain 1
As you can see, I don't have a name that I can search with LIKE or CONTAINS. So, what should I do? Because in the CSV, the cast list for each movie has more than one actor, and I'm saving them in a varchar or text type in the database, I have multiple actors per row.
Or should I create another table with a foreign key for actors and movies? Or is it not possible to do this in MySQL, but maybe possible in other SQLs?
P粉0806439752023-09-11 16:09:29
If you are looking for something performance efficient, you should split the data (create 3 tables in total - movies, actors, credits) and use credits to connect the actors with the movies and then write something with the join Simple SQL, as shown below:
Select actors.name as Actor, count(movies.title) as Amount_of_movies from actors, inner join cast on cast.actor_id = actors.actor_id inner join movies on movies.movie_id = cast.movie_id;
You can also do it another way using https://sebhastian.com/mysql-split-string/ or using psql/plsql and stored procedures. I would just split the data if possible.
cheers.