Exported on 6/12/2025 at 16:19:24 GMT+9 from Cursor (1.0.0)
User
Explain how KameSame does synonym detection when a user answers a question with a valid response but not the one we're looking for. can you please produce a report explaining this at a conceptual level and then back it up with progressively deeper technical dives?
I remember that while this is driven through the UI, it runs all the way down to the design of the database views, which i believe i even materialized for speed so that every night when the dictionaries are updated the synonym query is faster. i'll need you to check me on that stuff though b/c it's been 6+ years since i did this. provide receipts!
Cursor
I'll help you investigate how KameSame handles synonym detection. Let me start by searching for relevant code and then build up a comprehensive explanation.
First, let me search for code related to synonyms:
Let me search for more information about the database views and materialization:
Based on my research, I can provide a comprehensive explanation of how KameSame handles synonym detection. Let me break this down into different levels:
KameSame's synonym detection system is designed to recognize when a user provides a valid answer that, while not exactly matching the expected response, is semantically equivalent or closely related. This is particularly important for Japanese language learning where there can be multiple valid ways to express the same meaning.
The system works through several layers:
-
Answer Validation Pipeline:
- First checks for exact matches
- Then looks for reading-only matches
- Finally attempts to find alternate matches through synonym detection
-
Synonym Sources:
- Official dictionary definitions
- User-provided definitions
- User-added synonyms
- Parenthetical variations in definitions
The system uses materialized views for performance optimization:
create materialized view item_english_texts as
select distinct on (id, text)
id as item_id,
type,
unnest(array_cat(
array_massage_english(meaning_texts),
array_massage_english(array_strip_parens_english(meaning_texts))
)) as text
from items;
These views are refreshed when dictionaries are updated, as seen in the migration files and the WaniKani download script:
# From script/wanikani_download_v2
if wk_items.size > 0
# ... item updates ...
puts "Remember to refresh materialized views!"
end
The system uses multiple methods to detect similar meanings:
- Text Similarity:
- Uses PostgreSQL's
similarity()
function with a threshold of 0.6 - Implements Jaro-Winkler distance for fuzzy matching
- Uses Levenshtein distance for close matches
- Uses PostgreSQL's
# From app/lib/compares_english_similarity.rb
def similar?(a, b, min_distance = DEFAULT_JARO_WINKLER_DISTANCE)
a = clean(a)
b = clean(b)
Levenshtein.distance(a, b) <= 1 ||
Levenshtein.normalized_distance(a, b) < 0.12 ||
JaroWinkler.distance(a, b, ignore_case: true) >= min_distance
end
- Text Normalization:
- Removes parentheses and their contents
- Normalizes whitespace
- Converts to lowercase
- Removes special characters
CREATE OR REPLACE FUNCTION public.strip_parens_english(character varying)
RETURNS character varying
AS
$$
BEGIN
return regexp_replace($1, '\([^()]*?\)', '','g');
END;
$$
When a user submits an answer, the system:
- First checks for exact matches
- Then looks for reading-only matches
- Finally attempts to find alternate matches through synonym detection
# From app/lib/judges_production.rb
def call(learning, answer, study_mode)
return Judgment.new(:exactly_correct) if obviously_exactly_correct?(learning, answer)
return Judgment.new(:reading_correct) if obviously_reading_correct?(learning, answer)
result = Item.find_by_sql([
"select judge_production_by_terms(:user_id, :expected_item_id, :answer, :alternate_match_completion_eligible, :lesson_type::lesson_type, :type) as result",
# ... parameters ...
]).first&.result
# ... process result ...
end
-
Materialized Views:
item_japanese_texts
anditem_english_texts
are materialized for faster querying- These views are refreshed when dictionaries are updated
- Indexed for optimal performance:
CREATE INDEX item_english_idx ON item_english_texts USING GIN (text gin_trgm_ops); CREATE INDEX item_japanese_text_btree_idx ON item_japanese_texts(text);
-
Text Processing:
- Text normalization is done at the database level
- Uses PostgreSQL's trigram matching for fuzzy text search
- Implements custom functions for text massaging and comparison
The system provides feedback to users about alternate matches:
// From app/javascript/view/synonyms.js
export default function synonymsView ({ card }) {
if (_.isEmpty(card.alternativeMeanings)) return null
return h('div', { class: 'synonyms' },
h('h4', {}, 'Alternatives:'),
h('div', { class: 'items' }, card.alternativeMeanings.join(' • '))
)
}
This implementation allows KameSame to:
- Recognize valid alternative answers
- Provide helpful feedback to users
- Maintain performance through database optimizations
- Handle both official and user-provided synonyms
- Account for variations in how meanings can be expressed
The system is particularly sophisticated in its handling of Japanese language nuances, including reading variations and different ways of expressing the same meaning.
create materialized view item_english_texts as
select distinct on (id, text)
id as item_id,
type,
unnest(array_cat(
array_massage_english(meaning_texts),
array_massage_english(array_strip_parens_english(meaning_texts))
)) as text
from items;
# From script/wanikani_download_v2
if wk_items.size > 0
# ... item updates ...
puts "Remember to refresh materialized views!"
end
# From app/lib/compares_english_similarity.rb
def similar?(a, b, min_distance = DEFAULT_JARO_WINKLER_DISTANCE)
a = clean(a)
b = clean(b)
Levenshtein.distance(a, b) <= 1 ||
Levenshtein.normalized_distance(a, b) < 0.12 ||
JaroWinkler.distance(a, b, ignore_case: true) >= min_distance
end
CREATE OR REPLACE FUNCTION public.strip_parens_english(character varying)
RETURNS character varying
AS
$$
BEGIN
return regexp_replace($1, '\([^()]*?\)', '','g');
END;
$$
# From app/lib/judges_production.rb
def call(learning, answer, study_mode)
return Judgment.new(:exactly_correct) if obviously_exactly_correct?(learning, answer)
return Judgment.new(:reading_correct) if obviously_reading_correct?(learning, answer)
result = Item.find_by_sql([
"select judge_production_by_terms(:user_id, :expected_item_id, :answer, :alternate_match_completion_eligible, :lesson_type::lesson_type, :type) as result",
# ... parameters ...
]).first&.result
# ... process result ...
end
CREATE INDEX item_english_idx ON item_english_texts USING GIN (text gin_trgm_ops);
CREATE INDEX item_japanese_text_btree_idx ON item_japanese_texts(text);
// From app/javascript/view/synonyms.js
export default function synonymsView ({ card }) {
if (_.isEmpty(card.alternativeMeanings)) return null
return h('div', { class: 'synonyms' },
h('h4', {}, 'Alternatives:'),
h('div', { class: 'items' }, card.alternativeMeanings.join(' • '))
)
}
User
Please provide the schema of the related materialized views and listings of related PSQL function calls
Cursor
I'll break down the schema of the materialized views and related PostgreSQL functions that handle synonym detection in KameSame.
- item_japanese_texts
create materialized view item_japanese_texts as
select distinct on (item_id, text)
item_id,
type,
regexp_replace(text, '〜', '','g') as text,
common_word,
reading_only
from ((
select
id item_id,
type,
text,
common_word,
false reading_only
from items
) union (
select
id item_id,
type,
unnest(alt_texts) as text,
common_word,
false reading_only
from items
) union (
select
id item_id,
type,
unnest(reading_texts) as text,
common_word,
not items.usually_written_using_kana_alone reading_only
from items
)
) as japanese_texts;
-- Indexes
CREATE UNIQUE INDEX item_japanese_texts_unique_idx on item_japanese_texts(item_id, text);
CREATE INDEX item_japanese_item_id_btree_idx ON item_japanese_texts(item_id);
CREATE INDEX item_japanese_text_btree_idx ON item_japanese_texts(text);
CREATE INDEX item_japanese_type_btree_idx ON item_japanese_texts(type);
CREATE INDEX item_japanese_item_text_reading_btree_idx ON item_japanese_texts(item_id, text, reading_only);
- item_english_texts
create materialized view item_english_texts as
select distinct on (id, text)
id as item_id,
type,
unnest(array_cat(
array_massage_english(meaning_texts),
array_massage_english(array_strip_parens_english(meaning_texts))
)) as text
from items;
-- Indexes
CREATE UNIQUE INDEX item_english_texts_unique_idx ON item_english_texts(item_id, text);
CREATE INDEX item_english_idx ON item_english_texts USING GIN (text gin_trgm_ops);
CREATE INDEX item_english_item_id_btree_idx ON item_english_texts(item_id);
CREATE INDEX item_english_text_btree_idx ON item_english_texts(text);
CREATE INDEX item_english_type_btree_idx ON item_english_texts(type);
- searchable_items
create materialized view searchable_items as
select distinct on (items.id, j_text, e_text)
items.*,
j_text,
e_text
from items,
unnest(
public.array_hankakufy_romaji(array_prepend(
items.text,
array_cat(
items.alt_texts,
items.reading_texts
)
))
) j_text,
unnest(public.array_massage_english(items.meaning_texts)) e_text
where items.type <> 'radical';
-- Indexes
CREATE UNIQUE INDEX searchable_items_unique_idx ON searchable_items(id, j_text, e_text);
CREATE INDEX searchable_items_e_text_idx ON searchable_items USING GIN (e_text gin_trgm_ops);
CREATE INDEX searchable_items_e_text_btree_idx ON searchable_items(e_text);
CREATE INDEX searchable_items_j_text_idx ON searchable_items USING GIN (j_text gin_trgm_ops);
CREATE INDEX searchable_items_j_text_btree_idx ON searchable_items(j_text);
- Text Processing Functions
-- Strip parentheses from English text
CREATE OR REPLACE FUNCTION public.strip_parens_english(character varying)
RETURNS character varying
AS
$$
BEGIN
return regexp_replace($1, '\([^()]*?\)', '','g');
END;
$$
LANGUAGE plpgsql
STABLE
RETURNS NULL ON NULL INPUT;
-- Strip parentheses from array of English text
CREATE OR REPLACE FUNCTION public.array_strip_parens_english(character varying[])
RETURNS character varying[]
AS
$$
DECLARE
arrTexts ALIAS FOR $1;
retVal character varying[];
BEGIN
FOR I IN array_lower(arrTexts, 1)..array_upper(arrTexts, 1) LOOP
retVal[I] := public.strip_parens_english(arrTexts[I]);
END LOOP;
RETURN retVal;
END;
$$
LANGUAGE plpgsql
STABLE
RETURNS NULL ON NULL INPUT;
-- Massage English text (normalize)
CREATE OR REPLACE FUNCTION public.massage_english(character varying)
RETURNS character varying
AS
$$
BEGIN
return regexp_replace(
regexp_replace(lower($1), '[^a-z0-9\s]+', '','g'),
'\s+', ' ', 'g'
);
END;
$$
LANGUAGE plpgsql
STABLE
RETURNS NULL ON NULL INPUT;
-- Massage array of English text
CREATE OR REPLACE FUNCTION public.array_massage_english(character varying[])
RETURNS character varying[]
AS
$$
DECLARE
arrTexts ALIAS FOR $1;
retVal character varying[];
BEGIN
FOR I IN array_lower(arrTexts, 1)..array_upper(arrTexts, 1) LOOP
retVal[I] := public.massage_english(arrTexts[I]);
END LOOP;
RETURN retVal;
END;
$$
LANGUAGE plpgsql
STABLE
RETURNS NULL ON NULL INPUT;
-- Convert Japanese text to romaji
CREATE OR REPLACE FUNCTION public.array_hankakufy_romaji(character varying[])
RETURNS character varying[]
AS
$$
DECLARE
arrTexts ALIAS FOR $1;
retVal character varying[];
BEGIN
FOR I IN array_lower(arrTexts, 1)..array_upper(arrTexts, 1) LOOP
retVal[I] := translate(arrTexts[I],
'0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz',
'0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz'
);
END LOOP;
RETURN retVal;
END;
$$
LANGUAGE plpgsql
STABLE
RETURNS NULL ON NULL INPUT;
- Judgment Functions
CREATE OR REPLACE FUNCTION public.judge_production(
user_id bigint,
expected_item_id bigint,
answer character varying,
alternate_match_completion_eligible boolean,
lesson_type lesson_type
)
RETURNS character varying AS $$
DECLARE
result character varying = '';
BEGIN
-- exact match
execute '
select (case when reading_only then ''reading_correct'' else ''exactly_correct'' end)
from item_japanese_text_and_spellings
where item_id = $2
and text = $3
and (official_spelling or spelling_user_id = $1)
order by reading_only asc
limit 1
' into result using user_id, expected_item_id, answer;
if result is not null then
return result;
end if;
-- alternate match
execute '
select id from
(
select (
select (coalesce(
(
select ''alternate_match_completion:'' || alt_japanese_texts.item_id || '':'' || learnings.id
from reviews
join learnings on learnings.id = reviews.learning_id
where $4
and reviews.user_id = $1
and reviews.ready_for_review = true
and learnings.item_id = alt_japanese_texts.item_id
and learnings.lesson_type = $5
),
''alternate_match:'' || alt_japanese_texts.item_id
)
)
from item_english_text_and_definitions alt_english_texts
join item_english_text_and_definitions expected_english_texts
on expected_english_texts.item_id = $2
where alt_english_texts.item_id = alt_japanese_texts.item_id
and expected_english_texts.type = alt_japanese_texts.type
and (expected_english_texts.official_definition or expected_english_texts.definition_user_id = $1)
and (alt_english_texts.official_definition or alt_english_texts.definition_user_id = $1)
and similarity(expected_english_texts.text, alt_english_texts.text) > 0.6
limit 1
) id
from item_japanese_text_and_spellings alt_japanese_texts
where alt_japanese_texts.text = $3
and (alt_japanese_texts.official_spelling or alt_japanese_texts.spelling_user_id = $1)
order by alt_japanese_texts.common_word desc, alt_japanese_texts.reading_only asc
) alternate_matches where id is not null
limit 1;
' into result using user_id, expected_item_id, answer, alternate_match_completion_eligible, lesson_type;
if result is not null then
return result;
end if;
return 'incorrect';
END;
$$
LANGUAGE plpgsql
STABLE
RETURNS NULL ON NULL INPUT;
-
Text Normalization:
- Removes parentheses and their contents
- Normalizes whitespace and special characters
- Converts to lowercase
- Handles Japanese text variations
-
Performance Optimizations:
- Materialized views for faster querying
- Multiple indexes including GIN indexes for trigram matching
- Unique constraints to prevent duplicates
- B-tree indexes for exact matches
-
Similarity Detection:
- Uses PostgreSQL's
similarity()
function with a threshold of 0.6 - Implements trigram matching for fuzzy text search
- Handles both Japanese and English text variations
- Uses PostgreSQL's
-
View Dependencies:
item_japanese_texts
depends onitems
tableitem_english_texts
depends onitems
tablesearchable_items
combines both Japanese and English text
The schema is designed to efficiently handle both exact matches and fuzzy matching for synonyms, with particular attention to performance through materialized views and appropriate indexing strategies.
create materialized view item_japanese_texts as
select distinct on (item_id, text)
item_id,
type,
regexp_replace(text, '〜', '','g') as text,
common_word,
reading_only
from ((
select
id item_id,
type,
text,
common_word,
false reading_only
from items
) union (
select
id item_id,
type,
unnest(alt_texts) as text,
common_word,
false reading_only
from items
) union (
select
id item_id,
type,
unnest(reading_texts) as text,
common_word,
not items.usually_written_using_kana_alone reading_only
from items
)
) as japanese_texts;
-- Indexes
CREATE UNIQUE INDEX item_japanese_texts_unique_idx on item_japanese_texts(item_id, text);
CREATE INDEX item_japanese_item_id_btree_idx ON item_japanese_texts(item_id);
CREATE INDEX item_japanese_text_btree_idx ON item_japanese_texts(text);
CREATE INDEX item_japanese_type_btree_idx ON item_japanese_texts(type);
CREATE INDEX item_japanese_item_text_reading_btree_idx ON item_japanese_texts(item_id, text, reading_only);
create materialized view item_english_texts as
select distinct on (id, text)
id as item_id,
type,
unnest(array_cat(
array_massage_english(meaning_texts),
array_massage_english(array_strip_parens_english(meaning_texts))
)) as text
from items;
-- Indexes
CREATE UNIQUE INDEX item_english_texts_unique_idx ON item_english_texts(item_id, text);
CREATE INDEX item_english_idx ON item_english_texts USING GIN (text gin_trgm_ops);
CREATE INDEX item_english_item_id_btree_idx ON item_english_texts(item_id);
CREATE INDEX item_english_text_btree_idx ON item_english_texts(text);
CREATE INDEX item_english_type_btree_idx ON item_english_texts(type);
create materialized view searchable_items as
select distinct on (items.id, j_text, e_text)
items.*,
j_text,
e_text
from items,
unnest(
public.array_hankakufy_romaji(array_prepend(
items.text,
array_cat(
items.alt_texts,
items.reading_texts
)
))
) j_text,
unnest(public.array_massage_english(items.meaning_texts)) e_text
where items.type <> 'radical';
-- Indexes
CREATE UNIQUE INDEX searchable_items_unique_idx ON searchable_items(id, j_text, e_text);
CREATE INDEX searchable_items_e_text_idx ON searchable_items USING GIN (e_text gin_trgm_ops);
CREATE INDEX searchable_items_e_text_btree_idx ON searchable_items(e_text);
CREATE INDEX searchable_items_j_text_idx ON searchable_items USING GIN (j_text gin_trgm_ops);
CREATE INDEX searchable_items_j_text_btree_idx ON searchable_items(j_text);
-- Strip parentheses from English text
CREATE OR REPLACE FUNCTION public.strip_parens_english(character varying)
RETURNS character varying
AS
$$
BEGIN
return regexp_replace($1, '\([^()]*?\)', '','g');
END;
$$
LANGUAGE plpgsql
STABLE
RETURNS NULL ON NULL INPUT;
-- Strip parentheses from array of English text
CREATE OR REPLACE FUNCTION public.array_strip_parens_english(character varying[])
RETURNS character varying[]
AS
$$
DECLARE
arrTexts ALIAS FOR $1;
retVal character varying[];
BEGIN
FOR I IN array_lower(arrTexts, 1)..array_upper(arrTexts, 1) LOOP
retVal[I] := public.strip_parens_english(arrTexts[I]);
END LOOP;
RETURN retVal;
END;
$$
LANGUAGE plpgsql
STABLE
RETURNS NULL ON NULL INPUT;
-- Massage English text (normalize)
CREATE OR REPLACE FUNCTION public.massage_english(character varying)
RETURNS character varying
AS
$$
BEGIN
return regexp_replace(
regexp_replace(lower($1), '[^a-z0-9\s]+', '','g'),
'\s+', ' ', 'g'
);
END;
$$
LANGUAGE plpgsql
STABLE
RETURNS NULL ON NULL INPUT;
-- Massage array of English text
CREATE OR REPLACE FUNCTION public.array_massage_english(character varying[])
RETURNS character varying[]
AS
$$
DECLARE
arrTexts ALIAS FOR $1;
retVal character varying[];
BEGIN
FOR I IN array_lower(arrTexts, 1)..array_upper(arrTexts, 1) LOOP
retVal[I] := public.massage_english(arrTexts[I]);
END LOOP;
RETURN retVal;
END;
$$
LANGUAGE plpgsql
STABLE
RETURNS NULL ON NULL INPUT;
-- Convert Japanese text to romaji
CREATE OR REPLACE FUNCTION public.array_hankakufy_romaji(character varying[])
RETURNS character varying[]
AS
$$
DECLARE
arrTexts ALIAS FOR $1;
retVal character varying[];
BEGIN
FOR I IN array_lower(arrTexts, 1)..array_upper(arrTexts, 1) LOOP
retVal[I] := translate(arrTexts[I],
'0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz',
'0123456789abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz'
);
END LOOP;
RETURN retVal;
END;
$$
LANGUAGE plpgsql
STABLE
RETURNS NULL ON NULL INPUT;
CREATE OR REPLACE FUNCTION public.judge_production(
user_id bigint,
expected_item_id bigint,
answer character varying,
alternate_match_completion_eligible boolean,
lesson_type lesson_type
)
RETURNS character varying AS $$
DECLARE
result character varying = '';
BEGIN
-- exact match
execute '
select (case when reading_only then ''reading_correct'' else ''exactly_correct'' end)
from item_japanese_text_and_spellings
where item_id = $2
and text = $3
and (official_spelling or spelling_user_id = $1)
order by reading_only asc
limit 1
' into result using user_id, expected_item_id, answer;
if result is not null then
return result;
end if;
-- alternate match
execute '
select id from
(
select (
select (coalesce(
(
select ''alternate_match_completion:'' || alt_japanese_texts.item_id || '':'' || learnings.id
from reviews
join learnings on learnings.id = reviews.learning_id
where $4
and reviews.user_id = $1
and reviews.ready_for_review = true
and learnings.item_id = alt_japanese_texts.item_id
and learnings.lesson_type = $5
),
''alternate_match:'' || alt_japanese_texts.item_id
)
)
from item_english_text_and_definitions alt_english_texts
join item_english_text_and_definitions expected_english_texts
on expected_english_texts.item_id = $2
where alt_english_texts.item_id = alt_japanese_texts.item_id
and expected_english_texts.type = alt_japanese_texts.type
and (expected_english_texts.official_definition or expected_english_texts.definition_user_id = $1)
and (alt_english_texts.official_definition or alt_english_texts.definition_user_id = $1)
and similarity(expected_english_texts.text, alt_english_texts.text) > 0.6
limit 1
) id
from item_japanese_text_and_spellings alt_japanese_texts
where alt_japanese_texts.text = $3
and (alt_japanese_texts.official_spelling or alt_japanese_texts.spelling_user_id = $1)
order by alt_japanese_texts.common_word desc, alt_japanese_texts.reading_only asc
) alternate_matches where id is not null
limit 1;
' into result using user_id, expected_item_id, answer, alternate_match_completion_eligible, lesson_type;
if result is not null then
return result;
end if;
return 'incorrect';
END;
$$
LANGUAGE plpgsql
STABLE
RETURNS NULL ON NULL INPUT;