Home  >  Q&A  >  body text

Method to count duplicate names in Varchar/Text column in MySQL/Any SQL

So, here's the situation: I have a CSV file that looks like this:

show_id   title      cast
1         Batman     Robert Pattinson, Collin Farrel, Zoë Kravitz
2         Twilight   Robert Pattinson, Kristen Stewart
3         Ava        Jessica Chastain, Collin Farrel

What I need to do is open this CSV file in a Python function and do some processing on spaces etc.

Then I need to upload it into a SQL database (anything I want, but I chose MySQL), which is no problem.

question

My main problem is that I then need (as my challenge requires it) to create a query that counts the number of times an actor appears in all movies in a list. So, in this case, the query should look like this:

演员               电影数量
Robert Pattinson    2
Collin Farrel       2
Zoë Kravitz         1
Kristen Stewart     1
Jessica Chastain    1

As you can see, I don't have a name that I can search with LIKE or CONTAINS. So, what should I do? Because in the CSV, the cast list for each movie has more than one actor, and I'm saving them in a varchar or text type in the database, I have multiple actors per row.

Or should I create another table with a foreign key for actors and movies? Or is it not possible to do this in MySQL, but maybe possible in other SQLs?

P粉464113078P粉464113078405 days ago542

reply all(1)I'll reply

  • P粉080643975

    P粉0806439752023-09-11 16:09:29

    If you are looking for something performance efficient, you should split the data (create 3 tables in total - movies, actors, credits) and use credits to connect the actors with the movies and then write something with the join Simple SQL, as shown below:

    Select actors.name as Actor, count(movies.title) as Amount_of_movies from actors,
    inner join cast on cast.actor_id = actors.actor_id
    inner join movies on movies.movie_id = cast.movie_id;
    

    You can also do it another way using https://sebhastian.com/mysql-split-string/ or using psql/plsql and stored procedures. I would just split the data if possible.

    cheers.

    reply
    0
  • Cancelreply