Facial Regognition: SQL for separating one person into two (or more) #18974

czroth · 2025-06-06T16:24:02Z

czroth
Jun 6, 2025

The facial recognition software lumped my two sons together, but I didn't want to reprocess everything as the threshold was already splitting up my daughters into ~20 people each. I have a large library spanning about 20 years that I uploaded in multiple phases.

So, I wrote some SQL that helped me re-assign the incorrect cases. It was almost always one way, so I just had to check the older son's in my case.
I used the names Bob and Charlie for B and C, so that its a little clearer for others wanting to copy the code. You have to select photos that will serve as a reference for both people. I used about 20 for each person from different ages, facial expressions, and lighting conditions. Fill these into bob_reference and charlie_reference. (Be sure these are correctly labelled to the person!)

Then all the photos in the photos_to_split list are compared against both sets of reference faces, taking as a metric the one with the minimum distance, and another metric based on the best three matches, called reciprocal distance (based on the inverse of the sum of the inverses, like adding resistors in parallel).

You can tinker with the thresholds listed at the bottom of the SQL, but I found a difference less than -1 was indicative that it should be labelled the other person, ones between -1 and 1 I checked manually and reassigned, above 1 I took as labelled correctly. I've left the actual update commented out for now so that you can test the queries with a select before pulling the trigger on updating the tables.

Using this, I was able to avoid manually having to change 2700 incorrect categorizations, and only had to do about ~160 manually.

If there are some front-end and back-end devs that want to try and implement this as a service within the app, I'd be happy to help. But I code mostly in Python and SQL, so I'm not able to take this on by myself.

with photos_to_split as (
    select af.id, "assetId", embedding from asset_faces af
    join person p on af."personId" = p.id
    join face_search f on af.id = f."faceId"
    join assets a on af."assetId" = a.id
    where p.name = 'Bob'
--    and a."localDateTime" > '2010-01-01'  -- optionally add Charlie's birthdate to not split photos before this date
), bob_reference_photos as (
    select name, f.embedding, "personId"
    from asset_faces af
    join face_search f on af.id = f."faceId"
    join person p on af."personId" = p.id
    where af."assetId" in (
      'e80e5574-9664-4c1c-956c-4cc0fa1b6a6e', 
      'ea70f191-3c74-4d2b-8a4a-27a379dc9ea9', 
      'a7c8844e-837a-44a7-bb7b-2c540c680a0f', 
      '34444814-781d-4a6e-910e-10e287d73100', 
      'd1ba2ed9-7b95-4ef4-9dc0-0011d8e0467d', 
      'e5326e75-a95b-4192-87df-a59380caacb5', 
      'c54ba841-4990-4376-a4f6-9ef5c9ef4385', 
      'eae6621c-745f-4f3b-aaa5-419fa8d424fa', 
      '0c57acff-12b7-4e72-83fa-98d0dfd8d241', 
      '0b08a3f8-8cce-4c67-ab95-6ffaed5cb286', 
      '8a66b85f-7ee2-47ff-beec-8474047e3d4a', 
      'ea7c0bf4-ee34-4e6f-b552-89017cd2e2ff'
        )
     and name = 'Bob'
), bob_scoring as (
    select p."assetId", p.id, b.embedding <-> p.embedding as distance from bob_reference_photos b
    cross join photos_to_split p
), bob_ranked as (
    select "assetId", id, rank() over(partition by "assetId", id order by distance) as rank, distance from bob_scoring
), bob_aggregate as (
    select "assetId", id, 1/avg(1/(br.distance+0.00001)) as distance, min(distance) as min_distance from bob_ranked br where rank <= 3 group by "assetId", id
), charlie_reference_photos as (
    select name, f.embedding, "personId"
    from asset_faces af
    join face_search f on af.id = f."faceId"
    join person p on af."personId" = p.id
    where af."assetId" in (
      '0b966a20-e28f-4641-9cb8-79f79f83c7ca', 
      '726d6a27-c3b8-4eb0-b5a7-e7f84b7f0aa6', 
      '55282e60-7a9a-4d55-8f11-333177b9d9a2', 
      '8a333269-1982-4cd2-8e4c-f3ba66517a29', 
      '18c74676-25ef-4d90-8400-023d5d5f0ed3', 
      '3ed61f82-5e20-4435-a5bd-38a532f061a7', 
      '826aa9d5-7300-4dd4-8878-c0aee05d4f65', 
      'affda013-c17b-4e5d-ace0-bb5e43ea9e79', 
      '9e70d081-593d-427f-a0ed-c266908144d1', 
      '0f738be3-6a74-4a92-b5a8-9b8ca09b9b8f', 
      'b6338a98-f92a-4abb-a6b5-5707f7a8a8da', 
      '7872e742-140c-4f31-86e5-dd79776a6f1d'
        )
     and name = 'Charlie'
), charlie_scoring as (
    select p."assetId", p.id, c.embedding <-> p.embedding as distance from charlie_reference_photos c
    cross join photos_to_split p
), charlie_ranked as (
    select "assetId", id, rank() over(partition by "assetId", id order by distance) as rank, distance from charlie_scoring
), charlie_aggregate as (
    select "assetId", id, 1/avg(1/(br.distance+0.00001)) as distance, min(distance) as min_distance from charlie_ranked br where rank <= 3 group by "assetId", id
), scoring as (
    select ba."assetId", ba.id, ba.distance, ca.distance, ca.distance - ba.distance as reciprocal_difference, ca.min_distance - ba.min_distance as min_difference
    from bob_aggregate ba
    join charlie_aggregate ca on ba."assetId" = ca."assetId" and ba.id = ca.id
    order by min_difference
), to_update as (
    select s."assetId", s.id, s.reciprocal_difference, s.min_difference from scoring s
    join photos_to_split p on s."assetId" = p."assetId" and s.id = p.id
    where min_difference <= -1 or (reciprocal_difference <= -1.25 and min_difference <= -0.5)
    order by min_difference
), uncertain as (
    select s."assetId", s.id, s.reciprocal_difference, s.min_difference from scoring s
    join photos_to_split p on s."assetId" = p."assetId" and s.id = p.id
    where min_difference between 2.5 and 4 and reciprocal_difference <= 1.5
    order by min_difference
)
-- update asset_faces
-- set "personId" = 'd7ee552d-171f-4f77-88c2-d73af68dc70b'  -- Charlie's person ID.
-- where id in (select id from to_update);
select *, concat('https://<your_instance>/photos/', "assetId") from uncertain

czroth · 2025-06-06T16:25:54Z

czroth
Jun 6, 2025
Author

As stated in the heading, if you have 3 options, you can just add another set of sub-queries for the 3rd person.

0 replies

czroth · 2025-06-06T16:29:53Z

czroth
Jun 6, 2025
Author

I'm also thinking that this might be something potentially useful to add as a setup for immich for when you are about to upload a large library. Start with asking for a cohort of photos from different ages of your most photographed people, ensure these are properly labelled and then upload the rest. Then it can check every newly recognized face first against this shortlist, and then those that don't match can be treated as they are right now.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Facial Regognition: SQL for separating one person into two (or more) #18974

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Facial Regognition: SQL for separating one person into two (or more) #18974

Uh oh!

czroth Jun 6, 2025

Replies: 2 comments

Uh oh!

czroth Jun 6, 2025 Author

Uh oh!

czroth Jun 6, 2025 Author

czroth
Jun 6, 2025

czroth
Jun 6, 2025
Author

czroth
Jun 6, 2025
Author