-
Notifications
You must be signed in to change notification settings - Fork 729
Final tweaks to member enrichment #2704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…rified to prevent multiple org matches
WalkthroughThe pull request introduces significant modifications across multiple files in the member enrichment worker service. Key changes include the addition of credit checks in data fetching, enhancements to logging levels, and improved processing logic for member data. New functions for handling string sanitization are added, and the handling of member organizations is refined. Additionally, a new model variant is introduced in the LLM settings. Overall, the updates aim to streamline member enrichment processes and improve data integrity. Changes
Possibly related PRs
Suggested labels
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
….com:CrowdDotDev/crowd.dev into bugfix/check-org-existence-in-enrichment-fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (7)
services/apps/members_enrichment_worker/src/sources/progai/service.ts (2)
277-280
: Use URL Parsing for More Reliable GitHub Link DetectionInstead of using
includes('github')
to check ifcompanyUrl
is a GitHub link, consider parsing the URL to extract the hostname and compare it directly to'github.com'
. This will handle edge cases and make the check more robust.
294-295
: Use URL Parsing to Extract LinkedIn Company IdentifierUsing
split('/').pop()
may not reliably extract the company identifier fromcompanyLinkedInUrl
, especially if the URL has trailing slashes or query parameters. Consider using a URL parser to extract the path segments accurately.services/apps/members_enrichment_worker/src/activities/enrichment.ts (1)
610-610
: Typographical Error in Variable NameThe variable
profilesFromUnverfiedIdentities
has a typo. It should beprofilesFromUnverifiedIdentities
.Apply this diff to correct the typo:
- const profilesFromUnverfiedIdentities: IMemberEnrichmentDataNormalized[] = [] + const profilesFromUnverifiedIdentities: IMemberEnrichmentDataNormalized[] = []Also, update all occurrences of this variable throughout the code.
services/apps/members_enrichment_worker/src/workflows/enrichMember.ts (1)
50-54
: Consider adding logging for credit exhaustionWhile the credit check logic is sound, it would be helpful to log when enrichment is skipped due to insufficient credits. This would aid in monitoring and debugging.
if (!(await hasRemainingCredits(source))) { + this.log.info(`Skipping enrichment for source ${source} due to insufficient credits`, { + memberId: input.id, + source, + }); await touchMemberEnrichmentCacheUpdatedAt(source, input.id) continue }services/apps/members_enrichment_worker/src/sources/serp/service.ts (1)
138-142
: Consider enhancing LinkedIn handle extractionWhile the regex pattern is correct for standard LinkedIn profile URLs, consider handling additional edge cases:
- URLs with tracking parameters after the handle
- Language-specific LinkedIn domains
- Handles containing special characters
private getLinkedInProfileHandle(url: string): string | null { - const regex = /in\/([^/]+)/ + const regex = /in\/([^/?#]+)/ const match = url.match(regex) - return match ? match[1] : null + return match ? decodeURIComponent(match[1]) : null }services/apps/members_enrichment_worker/src/sources/crustdata/service.ts (1)
355-363
: Consider extracting the sanitization logic into a helper function.While the implementation is correct, the repeated use of
replaceDoubleQuotes
could be simplified by extracting it into a helper function.Consider this refactor:
+ private sanitizeEmploymentFields(fields: { + name?: string; + title?: string; + description?: string; + }) { + return { + name: fields.name ? replaceDoubleQuotes(fields.name) : null, + title: fields.title ? replaceDoubleQuotes(fields.title) : null, + description: fields.description ? replaceDoubleQuotes(fields.description) : null, + }; + } private normalizeEmployment( data: IMemberEnrichmentDataCrustdata, normalized: IMemberEnrichmentDataNormalized, ): IMemberEnrichmentDataNormalized { // ... existing code ... if (employmentInformation.length > 0) { for (const workExperience of employmentInformation) { const identities = [] // ... existing identity code ... + const sanitizedFields = this.sanitizeEmploymentFields({ + name: workExperience.employer_name, + title: workExperience.employee_title, + description: workExperience.employer_linkedin_description, + }); normalized.memberOrganizations.push({ - name: replaceDoubleQuotes(workExperience.employer_name), + name: sanitizedFields.name, source: OrganizationSource.ENRICHMENT_CRUSTDATA, identities, - title: replaceDoubleQuotes(workExperience.employee_title), + title: sanitizedFields.title, startDate: workExperience?.start_date ?? null, endDate: workExperience?.end_date ?? null, - organizationDescription: replaceDoubleQuotes( - workExperience.employer_linkedin_description, - ), + organizationDescription: sanitizedFields.description, }) } }services/libs/data-access-layer/src/old/apps/members_enrichment_worker/index.ts (1)
130-130
: LGTM: Added bot filtering for member enrichmentThe addition of the bot filtering condition is a good optimization that prevents unnecessary processing of bot accounts. The implementation safely handles null values by defaulting to false.
Consider adding an index on the
(attributes->>'isBot')::boolean
expression if you expect a significant number of bot accounts, as this would improve the query performance.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (14)
services/apps/members_enrichment_worker/src/activities/enrichment.ts
(8 hunks)services/apps/members_enrichment_worker/src/bin/onboarding.ts
(1 hunks)services/apps/members_enrichment_worker/src/sources/clearbit/service.ts
(2 hunks)services/apps/members_enrichment_worker/src/sources/crustdata/service.ts
(2 hunks)services/apps/members_enrichment_worker/src/sources/progai/service.ts
(2 hunks)services/apps/members_enrichment_worker/src/sources/serp/service.ts
(1 hunks)services/apps/members_enrichment_worker/src/workflows/enrichMember.ts
(2 hunks)services/apps/members_enrichment_worker/src/workflows/processMemberSources.ts
(3 hunks)services/libs/common/src/utils.ts
(1 hunks)services/libs/common_services/src/services/llm.service.ts
(1 hunks)services/libs/data-access-layer/src/old/apps/members_enrichment_worker/index.ts
(3 hunks)services/libs/data-access-layer/src/organizations/organizations.ts
(1 hunks)services/libs/types/src/enums/llm.ts
(1 hunks)services/libs/types/src/llm.ts
(3 hunks)
✅ Files skipped from review due to trivial changes (1)
- services/libs/common_services/src/services/llm.service.ts
🧰 Additional context used
🪛 GitHub Check: lint-format-services
services/libs/data-access-layer/src/old/apps/members_enrichment_worker/index.ts
[failure] 1-1:
Delete ;
[failure] 2-2:
Delete ;
[failure] 3-3:
Replace ·IAttributes,·IEnrichableMember,·IMemberEnrichmentCache,·IMemberEnrichmentSourceQueryInput,·IMemberIdentity,·IMemberOrganizationData,·IMemberOriginalData,·IOrganizationIdentity,·MemberEnrichmentSource,·MemberIdentityType,·OrganizationSource·}·from·'@crowd/types';⏎
with ⏎··IAttributes,⏎··IEnrichableMember,⏎··IMemberEnrichmentCache,⏎··IMemberEnrichmentSourceQueryInput,⏎··IMemberIdentity,⏎··IMemberOrganizationData,⏎··IMemberOriginalData,⏎··IOrganizationIdentity,⏎··MemberEnrichmentSource,⏎··MemberIdentityType,⏎··OrganizationSource,⏎}·from·'@crowd/types'
[failure] 704-704:
Insert ⏎
🔇 Additional comments (12)
services/apps/members_enrichment_worker/src/workflows/processMemberSources.ts (2)
236-236
: Data Sanitization Applied to Squashed Attributes
Good job applying cleanAttributeValue
to ensure the squashed attributes are sanitized before being added to the payload. This helps maintain data integrity.
268-283
: Correct Implementation to Limit Verified Identities
The logic correctly ensures that only one verified identity is retained in each work experience by marking additional verified identities as unverified. This prevents potential database conflicts with duplicate verified identities.
services/apps/members_enrichment_worker/src/activities/enrichment.ts (1)
68-69
: Simplified Control Flow in getEnrichmentData Function
Removing the credit check from getEnrichmentData
simplifies the function's control flow. Ensure that credit checks are handled appropriately elsewhere to manage resource usage.
services/libs/types/src/enums/llm.ts (1)
3-3
: New LLM Model Type Added
The addition of CLAUDE_3_5_SONNET_V2
to the LlmModelType
enum is correctly implemented. This extends the available models for LLM queries.
services/libs/common/src/utils.ts (1)
76-76
: Improved Double Quote Replacement
The replaceDoubleQuotes
function now handles various Unicode representations of double quotes, enhancing data sanitization across different character sets.
services/libs/types/src/llm.ts (2)
46-49
: Confirm pricing accuracy for new model
The pricing configuration for CLAUDE_3_5_SONNET_V2 matches CLAUDE_3_5_SONNET. Please verify these values are accurate according to the provider's documentation.
62-62
: Verify impact of model change on enrichment quality
Multiple enrichment queries have been updated to use CLAUDE_3_5_SONNET_V2. While the model configurations are consistent, we should verify that the new model maintains or improves the quality of enrichment results.
Also applies to: 70-70, 78-78
services/apps/members_enrichment_worker/src/sources/clearbit/service.ts (2)
3-3
: LGTM: Import statement added correctly.
The replaceDoubleQuotes
utility function is imported from the common package.
220-223
: LGTM: Employment data sanitization implemented correctly.
The replaceDoubleQuotes
function is appropriately applied to both the organization name and title fields.
services/apps/members_enrichment_worker/src/bin/onboarding.ts (1)
30-30
: Verify resource implications of unlimited member processing.
Changing maxMembersToProcess
to Infinity
removes the upper bound on member processing. While this allows for complete processing of all eligible members, consider:
- Memory usage over extended runs
- Database connection pool limits
- API rate limits for enrichment services
The maxConcurrentProcessing
limit of 5 provides some protection, but monitoring system resources during extended runs is advised.
✅ Verification successful
Let me gather more specific information about memory management and concurrency controls in this worker.
Let me check the member processing implementation to understand the batching and database interaction patterns.
Based on the gathered information, I can now provide a final response about the resource implications:
Setting maxMembersToProcess to Infinity is acceptable with current safeguards
The implementation has several built-in protections against resource exhaustion:
- Concurrent processing is limited to 5 members at a time via
maxConcurrentProcessing = 5
- Members are processed in pages with controlled database queries that filter for specific criteria (>1 enrichment source, >100 activities)
- Each member workflow has a 15-minute timeout and retry limits
- Database load is naturally regulated through the temporal workflow queue
The Infinity
setting simply allows the worker to continue processing eligible members without an arbitrary cap, while the above controls prevent resource overload.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check for any rate limiting configurations in the codebase
rg -A 3 'rate.*limit|throttle|maxRequest|concurrent'
# Look for memory management configurations
rg -A 3 'memory|heap|cache'
Length of output: 66264
Script:
#!/bin/bash
# Look for worker configuration and concurrency settings
rg -A 3 "maxConcurrentProcessing|batch.*size|chunk.*size|queue.*config" services/apps/members_enrichment_worker/
# Look for memory management in the worker
rg -A 3 "memory|heap" services/apps/members_enrichment_worker/
Length of output: 917
Script:
#!/bin/bash
# Look for database operations and member processing logic
rg -A 10 "getEnrichableMembers|processMember" services/apps/members_enrichment_worker/src/bin/onboarding.ts
# Check for any database connection pool configurations
rg -A 3 "pool.*size|connection.*pool" services/libs/data-access-layer/
Length of output: 2625
services/apps/members_enrichment_worker/src/sources/crustdata/service.ts (1)
3-3
: LGTM: Import statement added correctly.
The replaceDoubleQuotes
utility function is imported alongside the existing isEmail
import.
services/libs/data-access-layer/src/organizations/organizations.ts (1)
516-519
: LGTM: Appropriate logging level adjustment
The change from warn
to debug
level is appropriate here as this is an expected scenario in the normal flow, not a warning condition.
type: MemberIdentityType.USERNAME, | ||
verified: false, | ||
value: this.normalizeLinkedUrl(data.linkedinUrl).split('/').pop(), | ||
value: this.getLinkedInProfileHandle(this.normalizeLinkedUrl(data.linkedinUrl)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add error handling for handle extraction
The normalize method should handle cases where getLinkedInProfileHandle returns null.
- value: this.getLinkedInProfileHandle(this.normalizeLinkedUrl(data.linkedinUrl)),
+ value: this.getLinkedInProfileHandle(this.normalizeLinkedUrl(data.linkedinUrl)) ??
+ this.log.warn('Failed to extract LinkedIn handle', { url: data.linkedinUrl }) || null,
Committable suggestion skipped: line range outside the PR's diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (1)
services/apps/members_enrichment_worker/src/workflows/processMemberSources.ts (1)
268-282
: Consider extracting the identity verification logic into a separate functionWhile the logic for handling verified identities is correct, it could be more maintainable if extracted into a dedicated function. This would improve readability and make the code more testable.
Consider refactoring to:
+ function ensureSingleVerifiedIdentity(identities: any[]): any[] { + let found = false; + return (identities || []).map((i) => { + if (i.verified && !found) { + found = true; + return i; + } else if (i.verified) { + return { ...i, verified: false }; + } + return i; + }); + } workExperiencesSquashedByLLM.forEach((we) => { - let found = false - we.identities = (we.identities || []).map((i) => { - if (i.verified && !found) { - found = true; - return i; - } else if (i.verified) { - return { ...i, verified: false }; - } - return i; - }); + we.identities = ensureSingleVerifiedIdentity(we.identities); });This refactoring:
- Improves code organization and reusability
- Makes the logic easier to test in isolation
- Maintains the same functionality while being more maintainable
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
services/apps/members_enrichment_worker/src/activities/enrichment.ts
(8 hunks)services/apps/members_enrichment_worker/src/workflows/processMemberSources.ts
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- services/apps/members_enrichment_worker/src/activities/enrichment.ts
🔇 Additional comments (1)
services/apps/members_enrichment_worker/src/workflows/processMemberSources.ts (1)
236-236
: LGTM: Consistent attribute value cleaning
The addition of cleanAttributeValue
for LLM-squashed attributes maintains data consistency with individually processed attributes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (2)
services/apps/members_enrichment_worker/src/sources/progai/service.ts (2)
275-289
: Consider making domain verification logic more explicitThe GitHub URL exclusion logic is good, but consider extracting it into a separate method for better maintainability and reusability.
+ private isValidCompanyDomain(url: string, company: string): boolean { + const isGithubUrl = url.toLowerCase().includes('github') + const isGithubCompany = (company || '').toLowerCase().includes('github') + return !isGithubUrl && !isGithubCompany + } if (workExperience.companyUrl) { const normalizedDomain = websiteNormalizer(workExperience.companyUrl, false) - if ( - normalizedDomain && - !workExperience.companyUrl.toLowerCase().includes('github') && - !(workExperience.company || '').toLowerCase().includes('github') - ) { + if (normalizedDomain && this.isValidCompanyDomain(workExperience.companyUrl, workExperience.company)) {
320-336
: Consider enhancing LinkedIn URL handlingWhile the current implementation handles basic company and school URLs well, consider:
- Adding input validation
- Supporting more LinkedIn URL formats (e.g., with query parameters)
private getLinkedInProfileHandle(url: string): string | null { + if (!url || typeof url !== 'string') { + return null; + } + + // Remove query parameters and trailing slashes + const cleanUrl = url.split('?')[0].replace(/\/+$/, ''); + let regex = /company\/([^/]+)/ - let match = url.match(regex) + let match = cleanUrl.match(regex) if (match) { return `company:${match[1]}` } regex = /school\/([^/]+)/ - match = url.match(regex) + match = cleanUrl.match(regex) if (match) { return `school:${match[1]}` } return null }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
services/apps/members_enrichment_worker/src/sources/progai/service.ts
(2 hunks)
🔇 Additional comments (2)
services/apps/members_enrichment_worker/src/sources/progai/service.ts (2)
4-4
: LGTM: Good addition of string sanitization utility
The addition of replaceDoubleQuotes
from @crowd/common
is a good practice for sanitizing company names and titles.
264-268
: Good improvement in null handling
The enhanced conditions ensure we only process work experiences with valid company information, preventing potential issues with null values.
startDate: workExperience.startDate | ||
? workExperience.startDate.replace('Z', '+00:00') | ||
: null, | ||
endDate: workExperience.endDate ? workExperience.endDate.replace('Z', '+00:00') : null, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Based on the code analysis, I can now provide the final response:
Add date format validation for work experience dates
The code assumes dates from the ProgAI API are in a format ending with 'Z' (UTC timezone marker), but there's no validation to ensure this. The type definition shows that startDate
and endDate
are just string types without format constraints. Consider:
- Adding input validation to ensure dates match the expected format before manipulation
- Using a date parsing library like
date-fns
or similar to handle the timezone conversion safely - Adding error handling for malformed dates
🔗 Analysis chain
Validate date string manipulation
The date string manipulation assumes a specific format. Consider adding validation or using a date manipulation library to handle different date formats safely.
Let me check the input date format and validation in the codebase.
Let me check if there's any date format validation or parsing before this transformation.
Let me check the entire file content to understand the context better.
Let me check the type definitions for the work experience data structure.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check for potential date format issues in the codebase
rg -A 2 "replace\('Z', '\+00:00'\)"
Length of output: 608
Script:
#!/bin/bash
# Check the workExperience type definition and any date validation
ast-grep --pattern 'interface $_WorkExperience {
$$$
startDate$_
$$$
endDate$_
$$$
}'
# Check for any date validation or parsing logic
rg -B 2 -A 2 "workExperience\.(startDate|endDate)"
Length of output: 1053
Script:
#!/bin/bash
# Check for date validation or parsing in the file
rg -B 5 -A 5 "function.*workExperience" services/apps/members_enrichment_worker/src/sources/progai/service.ts
# Check if there's any date validation utility being used
rg -l "import.*date|moment|luxon|dayjs" services/apps/members_enrichment_worker/src/sources/progai/service.ts
# Look for any date format validation
ast-grep --pattern 'function $_(workExperience$_) {
$$$
}'
Length of output: 286
Script:
#!/bin/bash
# Get the content of the service file
cat services/apps/members_enrichment_worker/src/sources/progai/service.ts
# Check for any date-related types or interfaces
ast-grep --pattern 'type $_ = {
$$$
date$_
$$$
}'
# Check for any date validation functions
rg -l "validateDate|isValidDate|parseDate" services/apps/members_enrichment_worker/src/sources/progai/service.ts
Length of output: 11440
Script:
#!/bin/bash
# Check the types file for work experience interface
cat services/apps/members_enrichment_worker/src/sources/progai/types.ts
# Check if there's any date format validation in the utils
rg -l "date" services/apps/members_enrichment_worker/src/utils/
# Look for any date-related imports in the types file
rg "import.*date" services/apps/members_enrichment_worker/src/sources/progai/types.ts
Length of output: 2202
…eriences from enrichment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
services/apps/members_enrichment_worker/src/workflows/processMemberSources.ts (2)
268-282
: Consider enhancing identity selection logicWhile the logic to prevent multiple verified identities is sound, consider:
- Adding logging when identities are marked as unverified
- Implementing a more deterministic selection criteria (e.g., based on timestamp or data quality)
workExperiencesSquashedByLLM.forEach((we) => { let found = false we.identities = (we.identities || []).map((i) => { if (i.verified && !found) { found = true return i } else if (i.verified) { + console.log(`Marking identity as unverified for organization to prevent merge issues: ${JSON.stringify(i)}`) return { ...i, verified: false } } return i }) })
Line range hint
1-307
: Consider architectural improvementsWhile the code is functional, consider the following improvements:
- Break down the large
processMemberSources
function into smaller, focused functions for better maintainability:
squashIdentities
squashAttributes
squashWorkExperiences
- Add performance monitoring for LLM operations
- Enhance error handling for edge cases
This would improve maintainability and make the code easier to test.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (4)
services/apps/members_enrichment_worker/src/activities.ts
(2 hunks)services/apps/members_enrichment_worker/src/activities/enrichment.ts
(13 hunks)services/apps/members_enrichment_worker/src/workflows/enrichMember.ts
(2 hunks)services/apps/members_enrichment_worker/src/workflows/processMemberSources.ts
(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- services/apps/members_enrichment_worker/src/workflows/enrichMember.ts
- services/apps/members_enrichment_worker/src/activities/enrichment.ts
🔇 Additional comments (5)
services/apps/members_enrichment_worker/src/activities.ts (2)
11-11
: LGTM! Import added in correct alphabetical order.
The new import follows the existing pattern and maintains alphabetical ordering.
79-79
: Consider maintaining alphabetical order in exports and verify credit check implementation.
While the export addition is functionally correct, consider maintaining alphabetical ordering in the exports list for better maintainability. Additionally, since this function is critical for managing enrichment credits, ensure it's properly integrated across all enrichment workflows.
Let's verify the credit check implementation:
✅ Verification successful
Credit check implementation is properly integrated and working as expected.
The verification shows that:
- The
hasRemainingCredits
function is properly defined in the interface (types.ts
) and implemented by all enrichment sources - It's correctly used in the enrichment workflow to check credits before proceeding with enrichment
- The implementation includes proper caching mechanism in Redis to avoid frequent API calls
- Each source has its own implementation:
- SERP API checks account endpoint
- CrustData checks credits endpoint
- Others have placeholder implementations returning true
The only minor suggestion about alphabetical ordering in exports remains valid, but the credit check implementation is robust and properly integrated.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the credit check implementation and its usage
# Check where hasRemainingCredits is defined and how it's implemented
ast-grep --pattern 'function hasRemainingCredits($_) {
$$$
}'
# Check where hasRemainingCredits is used in enrichment workflows
rg -A 5 'hasRemainingCredits'
Length of output: 8838
services/apps/members_enrichment_worker/src/workflows/processMemberSources.ts (3)
236-236
: LGTM: Consistent attribute sanitization
Good addition of sanitization for LLM-processed attributes, maintaining data consistency with other attribute processing.
268-282
: Verify impact on existing organizations
The new identity verification logic could affect organization merging behavior.
297-297
: Verify CrustData profile selection impact
The addition of the CrustData profile selection flag affects member updates.
✅ Verification successful
CrustData profile selection flag is correctly implemented
The flag isHighConfidenceSourceSelectedForWorkExperiences
(passed as !!crustDataProfileSelected
) is used to determine how work experiences should be handled during member updates. When CrustData is selected as the source:
- All existing work experiences are deleted (including manually created ones)
- All new work experiences from CrustData are created
- This indicates high confidence in CrustData's accuracy for work experience data
This implementation is intentional and aligns with the workflow's purpose of preferring CrustData's work experience data when available.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify how the CrustData profile selection flag affects member updates
# Search for the usage of this flag in the update logic
ast-grep --pattern 'updateMemberUsingSquashedPayload.*crustData'
# Look for related CrustData processing logic
rg -A 5 'CrustData.*profile|profile.*CrustData'
Length of output: 116
Script:
#!/bin/bash
# Let's try a broader search to understand the context better
# Search for updateMemberUsingSquashedPayload function definition and usage
ast-grep --pattern 'function updateMemberUsingSquashedPayload'
# Search for CrustData related code
rg -l "CrustData"
# Search for the specific file content to understand the context
cat services/apps/members_enrichment_worker/src/workflows/processMemberSources.ts
Length of output: 10651
Script:
#!/bin/bash
# Let's look for the activity implementation to understand how the flag is used
# Search for updateMemberUsingSquashedPayload implementation
fd -e ts updateMemberUsingSquashedPayload --exec cat {}
# Search for any related configuration or constants
rg -A 5 "updateMemberUsingSquashedPayload.*crustData" services/apps/members_enrichment_worker/src/activities/
Length of output: 171
Script:
#!/bin/bash
# Let's try a different approach to locate the activity implementation
# Look for the activities directory content
ls -R services/apps/members_enrichment_worker/src/activities/
# Try to find the enrichment activity file
cat services/apps/members_enrichment_worker/src/activities/enrichment.ts
Length of output: 33974
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Outside diff range and nitpick comments (1)
services/apps/members_enrichment_worker/src/activities/enrichment.ts (1)
233-233
: Consider keeping important state changes at INFO level.While reducing log verbosity is good, consider keeping important state changes at INFO level for better operational visibility. For example, member attribute updates and reach changes are significant state modifications that operations might want to monitor without enabling DEBUG logging.
Also applies to: 252-252, 284-284, 302-302
export async function syncMember(memberId: string): Promise<void> { | ||
const syncApi = new SearchSyncApiClient({ | ||
baseUrl: process.env['CROWD_SEARCH_SYNC_API_URL'], | ||
}) | ||
|
||
// Convert strings to timestamps, using fallbacks for missing dates | ||
const start1 = d1Start ? new Date(d1Start).getTime() : -Infinity | ||
const end1 = d1End ? new Date(d1End).getTime() : Infinity | ||
const start2 = d2Start ? new Date(d2Start).getTime() : -Infinity | ||
const end2 = d2End ? new Date(d2End).getTime() : Infinity | ||
await syncApi.triggerMemberSync(memberId, { withAggs: false }) | ||
} | ||
|
||
export async function syncOrganization(organizationId: string): Promise<void> { | ||
const syncApi = new SearchSyncApiClient({ | ||
baseUrl: process.env['CROWD_SEARCH_SYNC_API_URL'], | ||
}) | ||
|
||
// Periods intersect if one period's start is before other period's end | ||
// and that same period's end is after the other period's start | ||
return start1 <= end2 && end1 >= start2 | ||
await syncApi.triggerOrganizationSync(organizationId, undefined, { withAggs: false }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling and environment validation for sync functions.
The sync functions need:
- Error handling for API calls
- Validation of environment variables
- Retry logic for transient failures
Add error handling:
export async function syncMember(memberId: string): Promise<void> {
+ const apiUrl = process.env['CROWD_SEARCH_SYNC_API_URL']
+ if (!apiUrl) {
+ throw new Error('CROWD_SEARCH_SYNC_API_URL environment variable is not set')
+ }
+
const syncApi = new SearchSyncApiClient({
- baseUrl: process.env['CROWD_SEARCH_SYNC_API_URL'],
+ baseUrl: apiUrl,
})
- await syncApi.triggerMemberSync(memberId, { withAggs: false })
+ try {
+ await syncApi.triggerMemberSync(memberId, { withAggs: false })
+ } catch (error) {
+ svc.log.error({ error, memberId }, 'Failed to sync member')
+ throw error
+ }
}
Committable suggestion skipped: line range outside the PR's diff.
isHighConfidenceSourceSelectedForWorkExperiences: boolean, | ||
): IWorkExperienceChanges { | ||
// we delete all the work experiences that were not manually created | ||
const toDelete = oldVersion.filter((c) => c.source !== OrganizationSource.UI) | ||
let toDelete = oldVersion.filter((c) => c.source !== OrganizationSource.UI) | ||
|
||
const toCreate: IMemberEnrichmentDataNormalizedOrganization[] = [] | ||
// eslint-disable-next-line @typescript-eslint/no-explicit-any | ||
const toUpdate: Map<IMemberOrganizationData, Record<string, any>> = new Map() | ||
|
||
if (isHighConfidenceSourceSelectedForWorkExperiences) { | ||
toDelete = oldVersion | ||
toCreate.push(...newVersion) | ||
return { | ||
toDelete, | ||
toCreate, | ||
toUpdate, | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document the high confidence override behavior and add safeguards.
The isHighConfidenceSourceSelectedForWorkExperiences
parameter allows overriding all existing work experiences, including manually created ones. This is a critical change that needs:
- Documentation explaining when this override should be used
- Safeguards to prevent accidental data loss
- Logging of override events for audit purposes
Add documentation and logging:
function prepareWorkExperiences(
oldVersion: IMemberOrganizationData[],
newVersion: IMemberEnrichmentDataNormalizedOrganization[],
isHighConfidenceSourceSelectedForWorkExperiences: boolean,
): IWorkExperienceChanges {
+ // When isHighConfidenceSourceSelectedForWorkExperiences is true, all existing work experiences
+ // will be replaced with the new version, including manually created ones. This should only be
+ // used when the source data is highly reliable and complete.
+ svc.log.info(
+ { isHighConfidenceSourceSelectedForWorkExperiences },
+ 'Processing work experiences with override mode'
+ )
Committable suggestion skipped: line range outside the PR's diff.
const match = orderedNewVersion.find( | ||
(e) => | ||
e.title === current.jobTitle && | ||
e.identities && | ||
e.identities.some((e) => e.organizationId === current.orgId), | ||
) | ||
if (!match) { | ||
// if we didn't find a match by title we should check dates | ||
match = orderedNewVersion.find( | ||
(e) => | ||
dateIntersects(current.dateStart, current.dateEnd, e.startDate, e.endDate) && | ||
e.identities && | ||
e.identities.some((e) => e.organizationId === current.orgId), | ||
) | ||
} | ||
|
||
// if we found a match we can check if we need something to update | ||
if (match) { | ||
// eslint-disable-next-line @typescript-eslint/no-explicit-any | ||
if ( | ||
match && | ||
current.dateStart === match.startDate && | ||
current.dateEnd === null && | ||
match.endDate !== null | ||
) { | ||
const toUpdateInner: Record<string, any> = {} | ||
|
||
// lets check if the dates and title are the same otherwise we need to update them | ||
if (current.dateStart !== match.startDate) { | ||
toUpdateInner.dateStart = match.startDate | ||
} | ||
|
||
if (current.dateEnd !== match.endDate) { | ||
toUpdateInner.dateEnd = match.endDate | ||
} | ||
|
||
if (current.jobTitle !== match.title) { | ||
toUpdateInner.title = match.title | ||
} | ||
|
||
if (Object.keys(toUpdateInner).length > 0) { | ||
toUpdate.set(current, toUpdateInner) | ||
} | ||
toUpdateInner.dateEnd = match.endDate | ||
toUpdate.set(current, toUpdateInner) | ||
|
||
// remove the match from the new version array so we later don't process it again | ||
orderedNewVersion = orderedNewVersion.filter((e) => e.id !== match.id) | ||
} else if ( | ||
match && | ||
(current.dateStart !== match.startDate || current.dateEnd !== null || match.endDate === null) | ||
) { | ||
// there's an incoming work experiences, but it's conflicting with the existing manually updated data | ||
// we shouldn't add or update anything when this happens | ||
// we can only update dateEnd of existing manually changed data, when it has a null dateEnd | ||
orderedNewVersion = orderedNewVersion.filter((e) => e.id !== match.id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Improve error handling for work experience matching.
The work experience matching logic needs better error handling:
- Validation of input data structures
- Handling of edge cases where identities array is undefined
- Logging of matching decisions for debugging
Add error handling:
const match = orderedNewVersion.find(
(e) =>
e.title === current.jobTitle &&
- e.identities &&
- e.identities.some((e) => e.organizationId === current.orgId),
+ Array.isArray(e.identities) &&
+ e.identities.some((identity) => {
+ if (!identity || !identity.organizationId) {
+ svc.log.warn({ experience: e }, 'Invalid identity data in work experience')
+ return false
+ }
+ return identity.organizationId === current.orgId
+ }),
)
+ svc.log.debug(
+ { current, match },
+ match ? 'Found matching work experience' : 'No matching work experience found'
+ )
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
const match = orderedNewVersion.find( | |
(e) => | |
e.title === current.jobTitle && | |
e.identities && | |
e.identities.some((e) => e.organizationId === current.orgId), | |
) | |
if (!match) { | |
// if we didn't find a match by title we should check dates | |
match = orderedNewVersion.find( | |
(e) => | |
dateIntersects(current.dateStart, current.dateEnd, e.startDate, e.endDate) && | |
e.identities && | |
e.identities.some((e) => e.organizationId === current.orgId), | |
) | |
} | |
// if we found a match we can check if we need something to update | |
if (match) { | |
// eslint-disable-next-line @typescript-eslint/no-explicit-any | |
if ( | |
match && | |
current.dateStart === match.startDate && | |
current.dateEnd === null && | |
match.endDate !== null | |
) { | |
const toUpdateInner: Record<string, any> = {} | |
// lets check if the dates and title are the same otherwise we need to update them | |
if (current.dateStart !== match.startDate) { | |
toUpdateInner.dateStart = match.startDate | |
} | |
if (current.dateEnd !== match.endDate) { | |
toUpdateInner.dateEnd = match.endDate | |
} | |
if (current.jobTitle !== match.title) { | |
toUpdateInner.title = match.title | |
} | |
if (Object.keys(toUpdateInner).length > 0) { | |
toUpdate.set(current, toUpdateInner) | |
} | |
toUpdateInner.dateEnd = match.endDate | |
toUpdate.set(current, toUpdateInner) | |
// remove the match from the new version array so we later don't process it again | |
orderedNewVersion = orderedNewVersion.filter((e) => e.id !== match.id) | |
} else if ( | |
match && | |
(current.dateStart !== match.startDate || current.dateEnd !== null || match.endDate === null) | |
) { | |
// there's an incoming work experiences, but it's conflicting with the existing manually updated data | |
// we shouldn't add or update anything when this happens | |
// we can only update dateEnd of existing manually changed data, when it has a null dateEnd | |
orderedNewVersion = orderedNewVersion.filter((e) => e.id !== match.id) | |
const match = orderedNewVersion.find( | |
(e) => | |
e.title === current.jobTitle && | |
Array.isArray(e.identities) && | |
e.identities.some((identity) => { | |
if (!identity || !identity.organizationId) { | |
svc.log.warn({ experience: e }, 'Invalid identity data in work experience') | |
return false | |
} | |
return identity.organizationId === current.orgId | |
}), | |
) | |
svc.log.debug( | |
{ current, match }, | |
match ? 'Found matching work experience' : 'No matching work experience found' | |
) | |
// if we found a match we can check if we need something to update | |
if ( | |
match && | |
current.dateStart === match.startDate && | |
current.dateEnd === null && | |
match.endDate !== null | |
) { | |
const toUpdateInner: Record<string, any> = {} | |
toUpdateInner.dateEnd = match.endDate | |
toUpdate.set(current, toUpdateInner) | |
// remove the match from the new version array so we later don't process it again | |
orderedNewVersion = orderedNewVersion.filter((e) => e.id !== match.id) | |
} else if ( | |
match && | |
(current.dateStart !== match.startDate || current.dateEnd !== null || match.endDate === null) | |
) { | |
// there's an incoming work experiences, but it's conflicting with the existing manually updated data | |
// we shouldn't add or update anything when this happens | |
// we can only update dateEnd of existing manually changed data, when it has a null dateEnd | |
orderedNewVersion = orderedNewVersion.filter((e) => e.id !== match.id) |
// try matching member's existing organizations with the new ones | ||
// we'll be using displayName, title, dates | ||
for (const org of squashedPayload.memberOrganizations) { | ||
if (!org.organizationId) { | ||
// Check if any similar in existing work experiences | ||
const existingOrg = existingMemberData.organizations.find((o) => { | ||
const incomingOrgStartDate = org.startDate ? new Date(org.startDate) : null | ||
const incomingOrgEndDate = org.endDate ? new Date(org.endDate) : null | ||
const existingOrgStartDate = o.dateStart ? new Date(o.dateStart) : null | ||
const existingOrgEndEndDate = o.dateEnd ? new Date(o.dateEnd) : null | ||
|
||
const isSameStartMonthYear = | ||
(!incomingOrgStartDate && !existingOrgStartDate) || // Both start dates are null | ||
(incomingOrgStartDate && | ||
existingOrgStartDate && | ||
incomingOrgStartDate.getMonth() === existingOrgStartDate.getMonth() && | ||
incomingOrgStartDate.getFullYear() === existingOrgStartDate.getFullYear()) | ||
|
||
const isSameEndMonthYear = | ||
(!incomingOrgEndDate && !existingOrgEndEndDate) || // Both end dates are null | ||
(incomingOrgEndDate && | ||
existingOrgEndEndDate && | ||
incomingOrgEndDate.getMonth() === existingOrgEndEndDate.getMonth() && | ||
incomingOrgEndDate.getFullYear() === existingOrgEndEndDate.getFullYear()) | ||
|
||
return ( | ||
(o.orgName.toLowerCase().includes(org.name.toLowerCase()) || | ||
org.name.toLowerCase().includes(o.orgName.toLowerCase())) && | ||
((isSameStartMonthYear && isSameEndMonthYear) || org.title === o.jobTitle) | ||
) | ||
}) | ||
|
||
if (existingOrg) { | ||
// Get all orgs with the same name as the current one | ||
const matchingOrgs = squashedPayload.memberOrganizations.filter( | ||
(otherOrg) => otherOrg.name === org.name, | ||
) | ||
|
||
// Set organizationId for all matching orgs | ||
for (const matchingOrg of matchingOrgs) { | ||
matchingOrg.organizationId = existingOrg.orgId | ||
} | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Refactor organization matching logic for better maintainability.
The organization matching logic is complex and could benefit from:
- Extracting date comparison logic into a helper function
- Using a more robust name comparison (e.g., Levenshtein distance)
- Adding comments explaining the matching criteria
Consider refactoring like this:
+function isSameMonthYear(date1: Date | null, date2: Date | null): boolean {
+ return (!date1 && !date2) || // Both dates are null
+ (date1 && date2 &&
+ date1.getMonth() === date2.getMonth() &&
+ date1.getFullYear() === date2.getFullYear())
+}
+
+function areOrganizationNamesSimilar(name1: string, name2: string): boolean {
+ // TODO: Implement Levenshtein distance comparison
+ return name1.toLowerCase().includes(name2.toLowerCase()) ||
+ name2.toLowerCase().includes(name1.toLowerCase())
+}
// Check if any similar in existing work experiences
const existingOrg = existingMemberData.organizations.find((o) => {
const incomingOrgStartDate = org.startDate ? new Date(org.startDate) : null
const incomingOrgEndDate = org.endDate ? new Date(org.endDate) : null
const existingOrgStartDate = o.dateStart ? new Date(o.dateStart) : null
const existingOrgEndEndDate = o.dateEnd ? new Date(o.dateEnd) : null
- const isSameStartMonthYear =
- (!incomingOrgStartDate && !existingOrgStartDate) || // Both start dates are null
- (incomingOrgStartDate &&
- existingOrgStartDate &&
- incomingOrgStartDate.getMonth() === existingOrgStartDate.getMonth() &&
- incomingOrgStartDate.getFullYear() === existingOrgStartDate.getFullYear())
+ const isSameStartMonthYear = isSameMonthYear(incomingOrgStartDate, existingOrgStartDate)
+ const isSameEndMonthYear = isSameMonthYear(incomingOrgEndDate, existingOrgEndEndDate)
return (
- (o.orgName.toLowerCase().includes(org.name.toLowerCase()) ||
- org.name.toLowerCase().includes(o.orgName.toLowerCase())) &&
+ areOrganizationNamesSimilar(o.orgName, org.name) &&
((isSameStartMonthYear && isSameEndMonthYear) || org.title === o.jobTitle)
)
})
Committable suggestion skipped: line range outside the PR's diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (2)
services/apps/members_enrichment_worker/src/activities/enrichment.ts (2)
239-239
: Consider using structured logging.The logging changes from
info
todebug
are consistent, but could benefit from structured logging patterns.Consider enhancing the logging:
-svc.log.debug({ memberId }, 'Adding to member identities!') +svc.log.debug({ + memberId, + action: 'add_member_identities', + identityCount: squashedPayload.identities.length +}, 'Adding member identities')Also applies to: 258-258, 290-290, 308-308
330-351
: Enhance organization matching logic with early returns.The nested organization matching logic could be more maintainable with early returns and clearer variable names.
Consider restructuring:
- // try matching member's existing organizations with the new ones - // we'll be using displayName, title, dates - for (const org of squashedPayload.memberOrganizations) { - if (!org.organizationId) { - // Check if any similar in existing work experiences - const existingOrg = existingMemberData.organizations.find((o) => - doesIncomingOrgExistInExistingOrgs(o, org), - ) - - if (existingOrg) { - // Get all orgs with the same name as the current one - const matchingOrgs = squashedPayload.memberOrganizations.filter( - (otherOrg) => otherOrg.name === org.name, - ) - - // Set organizationId for all matching orgs - for (const matchingOrg of matchingOrgs) { - matchingOrg.organizationId = existingOrg.orgId - } - } - } - } + // Match member's existing organizations with new ones using displayName, title, and dates + for (const incomingOrg of squashedPayload.memberOrganizations) { + // Skip if organization is already identified + if (incomingOrg.organizationId) continue; + + // Find matching existing organization + const matchingExistingOrg = existingMemberData.organizations.find(existingOrg => + doesIncomingOrgExistInExistingOrgs(existingOrg, incomingOrg) + ); + + if (!matchingExistingOrg) continue; + + // Update all organizations with the same name + const relatedOrgs = squashedPayload.memberOrganizations.filter( + org => org.name === incomingOrg.name + ); + + relatedOrgs.forEach(org => { + org.organizationId = matchingExistingOrg.orgId; + }); + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (4)
services/apps/members_enrichment_worker/src/activities/enrichment.ts
(20 hunks)services/libs/common/src/array.ts
(1 hunks)services/libs/data-access-layer/src/old/apps/members_enrichment_worker/index.ts
(3 hunks)services/libs/types/src/enrichment.ts
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- services/libs/data-access-layer/src/old/apps/members_enrichment_worker/index.ts
🔇 Additional comments (6)
services/libs/types/src/enrichment.ts (2)
3-3
: LGTM! Import statement is properly placed and used.
The import of IOrganizationIdentity
is correctly placed with other related imports and is utilized within the file.
43-43
: LGTM! Interface extension enhances organization identity tracking.
The optional identities
property is well-designed:
- Uses consistent naming with other interfaces
- Properly typed as an array of organization identities
- Maintains backward compatibility by being optional
Let's verify the usage of this new property:
✅ Verification successful
Based on the search results, I can now generate the final response:
The optional identities property is well-integrated and actively used across the codebase
The property is properly integrated into the organization-related functionality:
- Used in member enrichment worker for organization data processing
- Utilized in organization similarity calculations for merge suggestions
- Implemented in organization repositories for identity management
- Integrated with audit logging for tracking identity changes
- Applied in organization sync services for data synchronization
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check how the new identities property is being used across the codebase
# Search for IMemberOrganizationData usage
rg -t typescript "IMemberOrganizationData" -A 5
# Search for specific usage of the identities property
rg -t typescript "organizations.*identities" -A 5
Length of output: 169
Script:
#!/bin/bash
# Let's try without typescript file type flag since it's not recognized
# Search for IMemberOrganizationData interface usage
rg "IMemberOrganizationData" -A 5
# Search for files containing organization identities
rg "organizations.*identities" -A 5
# Also search for direct usage of IOrganizationIdentity
rg "IOrganizationIdentity" -A 5
Length of output: 33257
services/apps/members_enrichment_worker/src/activities/enrichment.ts (4)
3-8
: LGTM! Clean import organization.
The imports are well-organized and the new utilities from @crowd/common
are properly imported.
934-940
:
Add error handling for sync API calls.
The syncMember
function needs proper error handling and environment validation.
942-947
:
Add error handling for organization sync API calls.
The syncOrganization
function needs proper error handling and environment validation.
Apply similar error handling as suggested for syncMember
:
export async function syncOrganization(organizationId: string): Promise<void> {
+ const apiUrl = process.env['CROWD_SEARCH_SYNC_API_URL']
+ if (!apiUrl) {
+ throw new Error('CROWD_SEARCH_SYNC_API_URL environment variable is not set')
+ }
+
const syncApi = new SearchSyncApiClient({
- baseUrl: process.env['CROWD_SEARCH_SYNC_API_URL'],
+ baseUrl: apiUrl,
})
- await syncApi.triggerOrganizationSync(organizationId, undefined, { withAggs: false })
+ try {
+ await syncApi.triggerOrganizationSync(organizationId, undefined, { withAggs: false })
+ } catch (error) {
+ svc.log.error({ error, organizationId }, 'Failed to sync organization')
+ throw error
+ }
}
Likely invalid or redundant comment.
439-480
: 🛠️ Refactor suggestion
Extract date comparison logic into helper functions.
The doesIncomingOrgExistInExistingOrgs
function has complex date comparison logic that could be simplified.
Consider extracting helper functions:
+function getVerifiedPrimaryDomains(identities: any[]): string[] {
+ return identities
+ .filter(i => i.type === OrganizationIdentityType.PRIMARY_DOMAIN && i.verified)
+ .map(i => i.value);
+}
+
+function isSameMonthYear(date1: Date | null, date2: Date | null): boolean {
+ if (!date1 && !date2) return true;
+ if (!date1 || !date2) return false;
+ return date1.getMonth() === date2.getMonth() &&
+ date1.getFullYear() === date2.getFullYear();
+}
+
+function areOrganizationNamesSimilar(name1: string, name2: string): boolean {
+ const n1 = name1.toLowerCase();
+ const n2 = name2.toLowerCase();
+ return n1.includes(n2) || n2.includes(n1);
+}
+
export function doesIncomingOrgExistInExistingOrgs(
existingOrg: IMemberOrganizationData,
incomingOrg: IMemberEnrichmentDataNormalizedOrganization,
): boolean {
- const incomingVerifiedPrimaryDomainIdentityValues = incomingOrg.identities
- .filter((i) => i.type === OrganizationIdentityType.PRIMARY_DOMAIN && i.verified)
- .map((i) => i.value)
-
- const existingVerifiedPrimaryDomainIdentityValues = existingOrg.identities
- .filter((i) => i.type === OrganizationIdentityType.PRIMARY_DOMAIN && i.verified)
- .map((i) => i.value)
+ const incomingDomains = getVerifiedPrimaryDomains(incomingOrg.identities);
+ const existingDomains = getVerifiedPrimaryDomains(existingOrg.identities);
const incomingOrgStartDate = incomingOrg.startDate ? new Date(incomingOrg.startDate) : null
const incomingOrgEndDate = incomingOrg.endDate ? new Date(incomingOrg.endDate) : null
const existingOrgStartDate = existingOrg.dateStart ? new Date(existingOrg.dateStart) : null
const existingOrgEndEndDate = existingOrg.dateEnd ? new Date(existingOrg.dateEnd) : null
- const isSameStartMonthYear =
- (!incomingOrgStartDate && !existingOrgStartDate) ||
- (incomingOrgStartDate &&
- existingOrgStartDate &&
- incomingOrgStartDate.getMonth() === existingOrgStartDate.getMonth() &&
- incomingOrgStartDate.getFullYear() === existingOrgStartDate.getFullYear())
-
- const isSameEndMonthYear =
- (!incomingOrgEndDate && !existingOrgEndEndDate) ||
- (incomingOrgEndDate &&
- existingOrgEndEndDate &&
- incomingOrgEndDate.getMonth() === existingOrgEndEndDate.getMonth() &&
- incomingOrgEndDate.getFullYear() === existingOrgEndEndDate.getFullYear())
+ const hasSameStartDate = isSameMonthYear(incomingOrgStartDate, existingOrgStartDate);
+ const hasSameEndDate = isSameMonthYear(incomingOrgEndDate, existingOrgEndEndDate);
return (
- hasIntersection(
- incomingVerifiedPrimaryDomainIdentityValues,
- existingVerifiedPrimaryDomainIdentityValues,
- ) ||
- ((existingOrg.orgName.toLowerCase().includes(incomingOrg.name.toLowerCase()) ||
- incomingOrg.name.toLowerCase().includes(existingOrg.orgName.toLowerCase())) &&
- ((isSameStartMonthYear && isSameEndMonthYear) || incomingOrg.title === existingOrg.jobTitle))
+ hasIntersection(incomingDomains, existingDomains) ||
+ (areOrganizationNamesSimilar(existingOrg.orgName, incomingOrg.name) &&
+ ((hasSameStartDate && hasSameEndDate) || incomingOrg.title === existingOrg.jobTitle))
)
}
Likely invalid or redundant comment.
export const hasIntersection = (arr1: string[], arr2: string[]): boolean => { | ||
const set1 = new Set(arr1) | ||
return arr2.some((item) => set1.has(item)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add input validation and edge case handling.
The hasIntersection
function implementation is efficient using Set, but could be more robust with additional checks.
Consider this enhanced implementation:
-export const hasIntersection = (arr1: string[], arr2: string[]): boolean => {
- const set1 = new Set(arr1)
- return arr2.some((item) => set1.has(item))
-}
+export const hasIntersection = (arr1: string[], arr2: string[]): boolean => {
+ // Validate inputs
+ if (!Array.isArray(arr1) || !Array.isArray(arr2)) {
+ throw new TypeError('Both arguments must be arrays')
+ }
+
+ // Handle empty arrays
+ if (arr1.length === 0 || arr2.length === 0) {
+ return false
+ }
+
+ // Create Set from the smaller array for better performance
+ const [smallerArr, largerArr] = arr1.length < arr2.length ? [arr1, arr2] : [arr2, arr1]
+ const set = new Set(smallerArr.map(item => item?.toLowerCase()))
+
+ // Check for intersection with case-insensitive comparison
+ return largerArr.some(item => item && set.has(item.toLowerCase()))
+}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
export const hasIntersection = (arr1: string[], arr2: string[]): boolean => { | |
const set1 = new Set(arr1) | |
return arr2.some((item) => set1.has(item)) | |
} | |
export const hasIntersection = (arr1: string[], arr2: string[]): boolean => { | |
// Validate inputs | |
if (!Array.isArray(arr1) || !Array.isArray(arr2)) { | |
throw new TypeError('Both arguments must be arrays') | |
} | |
// Handle empty arrays | |
if (arr1.length === 0 || arr2.length === 0) { | |
return false | |
} | |
// Create Set from the smaller array for better performance | |
const [smallerArr, largerArr] = arr1.length < arr2.length ? [arr1, arr2] : [arr2, arr1] | |
const set = new Set(smallerArr.map(item => item?.toLowerCase())) | |
// Check for intersection with case-insensitive comparison | |
return largerArr.some(item => item && set.has(item.toLowerCase())) | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
services/apps/members_enrichment_worker/src/activities/enrichment.ts (2)
445-485
: Refactor date comparison logic for better maintainabilityThe date comparison logic is duplicated and could be extracted into a helper function for better maintainability.
Consider this refactoring:
+function isSameMonthYear(date1: Date | null, date2: Date | null): boolean { + return (!date1 && !date2) || // Both dates are null + (date1 && date2 && + date1.getMonth() === date2.getMonth() && + date1.getFullYear() === date2.getFullYear()) +} export function doesIncomingOrgExistInExistingOrgs( existingOrg: IMemberOrganizationData, incomingOrg: IMemberEnrichmentDataNormalizedOrganization, ): boolean { // ... existing identity filtering code ... const incomingOrgStartDate = incomingOrg.startDate ? new Date(incomingOrg.startDate) : null const incomingOrgEndDate = incomingOrg.endDate ? new Date(incomingOrg.endDate) : null const existingOrgStartDate = existingOrg.dateStart ? new Date(existingOrg.dateStart) : null const existingOrgEndEndDate = existingOrg.dateEnd ? new Date(existingOrg.dateEnd) : null - const isSameStartMonthYear = - (!incomingOrgStartDate && !existingOrgStartDate) || // Both start dates are null - (incomingOrgStartDate && - existingOrgStartDate && - incomingOrgStartDate.getMonth() === existingOrgStartDate.getMonth() && - incomingOrgStartDate.getFullYear() === existingOrgStartDate.getFullYear()) + const isSameStartMonthYear = isSameMonthYear(incomingOrgStartDate, existingOrgStartDate) + const isSameEndMonthYear = isSameMonthYear(incomingOrgEndDate, existingOrgEndEndDate) - const isSameEndMonthYear = - (!incomingOrgEndDate && !existingOrgEndEndDate) || // Both end dates are null - (incomingOrgEndDate && - existingOrgEndEndDate && - incomingOrgEndDate.getMonth() === existingOrgEndEndDate.getMonth() && - incomingOrgEndDate.getFullYear() === existingOrgEndEndDate.getFullYear()) return ( hasIntersection( incomingVerifiedPrimaryDomainIdentityValues, existingVerifiedPrimaryDomainIdentityValues, ) || ((existingOrg.orgName.toLowerCase().includes(incomingOrg.name.toLowerCase()) || incomingOrg.name.toLowerCase().includes(existingOrg.orgName.toLowerCase())) && ((isSameStartMonthYear && isSameEndMonthYear) || incomingOrg.title === existingOrg.jobTitle)) ) }
Line range hint
847-925
: Consider splitting the work experience preparation logicThe prepareWorkExperiences function is becoming complex with multiple responsibilities. Consider splitting it into smaller, focused functions.
Suggested structure:
interface IWorkExperiencePreparation { processHighConfidenceSource(): IWorkExperienceChanges; processManualChanges(): IWorkExperienceChanges; processNewExperiences(): IWorkExperienceChanges; } class WorkExperiencePreparation implements IWorkExperiencePreparation { constructor( private oldVersion: IMemberOrganizationData[], private newVersion: IMemberEnrichmentDataNormalizedOrganization[], private isHighConfidenceSource: boolean ) {} prepare(): IWorkExperienceChanges { if (this.isHighConfidenceSource) { return this.processHighConfidenceSource(); } return this.processRegularSource(); } private processRegularSource(): IWorkExperienceChanges { const manualChanges = this.processManualChanges(); const newExperiences = this.processNewExperiences(); return this.mergeChanges(manualChanges, newExperiences); } // ... implement other methods }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
services/apps/members_enrichment_worker/src/activities/enrichment.ts
(19 hunks)
🔇 Additional comments (5)
services/apps/members_enrichment_worker/src/activities/enrichment.ts (5)
3-8
: LGTM!
The new imports are properly aligned with the added functionality and follow the module's organization pattern.
239-239
: LGTM: Logging level adjustments
The change from info to debug level is appropriate for these operational logs.
Also applies to: 258-258, 290-290, 308-308
940-953
:
Add error handling and environment validation for sync functions
The sync functions need proper error handling and environment validation.
Apply this diff:
+function validateSyncApiUrl(): string {
+ const apiUrl = process.env['CROWD_SEARCH_SYNC_API_URL'];
+ if (!apiUrl) {
+ throw new Error('CROWD_SEARCH_SYNC_API_URL environment variable is not set');
+ }
+ return apiUrl;
+}
export async function syncMember(memberId: string): Promise<void> {
+ const apiUrl = validateSyncApiUrl();
const syncApi = new SearchSyncApiClient({
- baseUrl: process.env['CROWD_SEARCH_SYNC_API_URL'],
+ baseUrl: apiUrl,
})
- await syncApi.triggerMemberSync(memberId, { withAggs: false })
+ try {
+ await syncApi.triggerMemberSync(memberId, { withAggs: false })
+ } catch (error) {
+ svc.log.error({ error, memberId }, 'Failed to sync member')
+ throw error
+ }
}
export async function syncOrganization(organizationId: string): Promise<void> {
+ const apiUrl = validateSyncApiUrl();
const syncApi = new SearchSyncApiClient({
- baseUrl: process.env['CROWD_SEARCH_SYNC_API_URL'],
+ baseUrl: apiUrl,
})
- await syncApi.triggerOrganizationSync(organizationId, undefined, { withAggs: false })
+ try {
+ await syncApi.triggerOrganizationSync(organizationId, undefined, { withAggs: false })
+ } catch (error) {
+ svc.log.error({ error, organizationId }, 'Failed to sync organization')
+ throw error
+ }
}
426-438
:
Add error handling for sync operations
The sync operations after member updates lack error handling. A failure in sync operations could leave the system in an inconsistent state.
Apply this diff:
- await Promise.all(promises)
+ try {
+ await Promise.all(promises)
+ } catch (error) {
+ svc.log.error({ error, memberId }, 'Failed to process member updates')
+ throw error
+ }
if (updated) {
await setMemberEnrichmentUpdateDateDb(tx.transaction(), memberId)
- await syncMember(memberId)
+ try {
+ await syncMember(memberId)
+ } catch (error) {
+ svc.log.error({ error, memberId }, 'Failed to sync member')
+ throw error
+ }
} else {
await setMemberEnrichmentTryDateDb(tx.transaction(), memberId)
}
- for (const orgId of orgIdsToSync) {
- await syncOrganization(orgId)
+ try {
+ await Promise.all(orgIdsToSync.map(orgId => syncOrganization(orgId)))
+ } catch (error) {
+ svc.log.error({ error, orgIds: orgIdsToSync }, 'Failed to sync organizations')
+ throw error
+ }
Likely invalid or redundant comment.
230-231
: Verify high confidence source handling
The new parameter isHighConfidenceSourceSelectedForWorkExperiences
has significant impact on work experience data. Let's verify its usage across the codebase.
✅ Verification successful
Parameter usage is properly implemented and contained
The isHighConfidenceSourceSelectedForWorkExperiences
parameter is used correctly within a well-defined scope:
- It's passed from
updateMemberUsingSquashedPayload
toprepareWorkExperiences
- In
prepareWorkExperiences
, it controls whether to completely replace work experiences (when true) or preserve manually created ones (when false) - The implementation is self-contained and doesn't require changes in other parts of the codebase
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check how isHighConfidenceSourceSelectedForWorkExperiences is used
# Expected: Find all calls to updateMemberUsingSquashedPayload to verify proper parameter passing
rg -A 5 "updateMemberUsingSquashedPayload.*isHighConfidenceSourceSelectedForWorkExperiences"
Length of output: 94
Script:
#!/bin/bash
# Let's first find the file containing the function definition
fd "enrichment.ts" --exec ast-grep --pattern 'function updateMemberUsingSquashedPayload($_) {
$$$
}'
# Also search for any references to this parameter
rg "isHighConfidenceSourceSelectedForWorkExperiences"
Length of output: 670
Script:
#!/bin/bash
# Let's see the context around these usages to understand the implementation
rg -B 5 -A 10 "isHighConfidenceSourceSelectedForWorkExperiences" services/apps/members_enrichment_worker/src/activities/enrichment.ts
Length of output: 2094
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (1)
services/libs/opensearch/src/service/organization.sync.service.ts (1)
417-422
: Add error logging for skipped organizationsThe null check is a good addition to prevent processing undefined organizations. However, consider adding error logging when an organization is skipped to help with debugging.
if (base) { const data = await buildFullOrgForMergeSuggestions(qx, base) const prefixed = OrganizationSyncService.prefixData(data) await this.openSearchService.index(orgId, OpenSearchIndex.ORGANIZATIONS, prefixed) +} else { + this.log.warn({ orgId }, 'Organization not found, skipping OpenSearch indexing') }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
services/libs/opensearch/src/service/organization.sync.service.ts
(1 hunks)
🔇 Additional comments (1)
services/libs/opensearch/src/service/organization.sync.service.ts (1)
417-422
: Verify member enrichment impact
Since this change is part of member enrichment tweaks, let's verify that skipping non-existent organizations doesn't affect member processing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (2)
services/apps/members_enrichment_worker/src/activities/enrichment.ts (2)
853-854
: Document the high confidence source parameter.The
isHighConfidenceSourceSelectedForWorkExperiences
parameter needs documentation explaining its purpose and impact on work experience processing.Add JSDoc comment:
+/** + * Prepares work experiences for update by comparing old and new versions. + * @param oldVersion - Existing work experiences in the database + * @param newVersion - New work experiences from enrichment + * @param isHighConfidenceSourceSelectedForWorkExperiences - When true, replaces all existing work experiences with new ones + * @returns Object containing work experiences to delete, create, and update + */ function prepareWorkExperiences(
919-920
: Improve type safety in work experience updates.Using
Record<string, any>
reduces type safety. Consider creating a specific interface for work experience updates.+interface IWorkExperienceUpdate { + dateEnd?: string | null; + // add other possible update fields +} -const toUpdateInner: Record<string, any> = {} +const toUpdateInner: IWorkExperienceUpdate = {}
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
services/apps/members_enrichment_worker/src/activities/enrichment.ts
(19 hunks)
🔇 Additional comments (3)
services/apps/members_enrichment_worker/src/activities/enrichment.ts (3)
3-8
: LGTM: Import changes are well-structured.
The new imports from @crowd/common are properly added and utilized throughout the code.
946-959
:
Add error handling and environment validation for sync functions.
The sync functions need proper error handling and environment variable validation.
This issue was previously identified in a past review. The same fixes for error handling and environment validation should be applied here.
239-239
: Verify logging visibility after level changes.
The change from info
to debug
level is appropriate for these operational logs. However, ensure that these events are still adequately visible for troubleshooting when needed.
Also applies to: 258-258, 290-290, 308-308
✅ Verification successful
Based on the log level usage patterns shown in the codebase, I can now generate the final response:
Log level changes align with codebase patterns
The change from info
to debug
for these operational logs is consistent with the codebase's logging patterns where:
debug
is used for detailed operational tracing (e.g., query execution, message processing)info
is used for significant state changes and important operationswarn
is used for concerning but non-critical issueserror
is used for failures and critical issues
The changed logs are appropriately set to debug
as they trace routine member identity processing operations, similar to other debug-level operational logs in the codebase.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for other log levels in the codebase to ensure consistent logging patterns
rg -A 1 "log\.(info|debug|warn|error)" --type ts
Length of output: 70664
orgIdsToSync.map((orgId) => | ||
syncOrganization(orgId).catch((error) => { | ||
console.error(`Failed to sync organization with ID ${orgId}:`, error) | ||
}), | ||
), | ||
), | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improve error handling in organization sync operations.
The current error handling has several issues:
- Uses
console.error
instead of the service logger (svc.log.error
) - Silently catches errors without proper error propagation
Apply this fix:
orgIdsToSync.map((orgId) =>
- syncOrganization(orgId).catch((error) => {
- console.error(`Failed to sync organization with ID ${orgId}:`, error)
- }),
+ syncOrganization(orgId).catch((error) => {
+ svc.log.error({ error, orgId }, 'Failed to sync organization')
+ throw error
+ }),
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
orgIdsToSync.map((orgId) => | |
syncOrganization(orgId).catch((error) => { | |
console.error(`Failed to sync organization with ID ${orgId}:`, error) | |
}), | |
), | |
), | |
), | |
orgIdsToSync.map((orgId) => | |
syncOrganization(orgId).catch((error) => { | |
svc.log.error({ error, orgId }, 'Failed to sync organization') | |
throw error | |
}), | |
), | |
), | |
), |
…'s already a profile from verified
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (1)
services/apps/members_enrichment_worker/src/activities/enrichment.ts (1)
453-494
: Refactor organization matching logic for better maintainability.The organization matching logic is complex and could benefit from:
- Extracting date comparison logic into helper functions
- Adding comments explaining the matching criteria
Consider this refactor:
+function isSameMonthYear(date1: Date | null, date2: Date | null): boolean { + return (!date1 && !date2) || // Both dates are null + (date1 && date2 && + date1.getMonth() === date2.getMonth() && + date1.getFullYear() === date2.getFullYear()) +} export function doesIncomingOrgExistInExistingOrgs( existingOrg: IMemberOrganizationData, incomingOrg: IMemberEnrichmentDataNormalizedOrganization, ): boolean { // Extract domain identities for comparison const incomingVerifiedPrimaryDomainIdentityValues = incomingOrg.identities .filter((i) => i.type === OrganizationIdentityType.PRIMARY_DOMAIN && i.verified) .map((i) => i.value) const existingVerifiedPrimaryDomainIdentityValues = existingOrg.identities .filter((i) => i.type === OrganizationIdentityType.PRIMARY_DOMAIN && i.verified) .map((i) => i.value) const incomingOrgStartDate = incomingOrg.startDate ? new Date(incomingOrg.startDate) : null const incomingOrgEndDate = incomingOrg.endDate ? new Date(incomingOrg.endDate) : null const existingOrgStartDate = existingOrg.dateStart ? new Date(existingOrg.dateStart) : null const existingOrgEndEndDate = existingOrg.dateEnd ? new Date(existingOrg.dateEnd) : null - const isSameStartMonthYear = - (!incomingOrgStartDate && !existingOrgStartDate) || // Both start dates are null - (incomingOrgStartDate && - existingOrgStartDate && - incomingOrgStartDate.getMonth() === existingOrgStartDate.getMonth() && - incomingOrgStartDate.getFullYear() === existingOrgStartDate.getFullYear()) + const isSameStartMonthYear = isSameMonthYear(incomingOrgStartDate, existingOrgStartDate) + const isSameEndMonthYear = isSameMonthYear(incomingOrgEndDate, existingOrgEndEndDate) return ( hasIntersection( incomingVerifiedPrimaryDomainIdentityValues, existingVerifiedPrimaryDomainIdentityValues, ) || ((existingOrg.orgName.toLowerCase().includes(incomingOrg.name.toLowerCase()) || incomingOrg.name.toLowerCase().includes(existingOrg.orgName.toLowerCase())) && ((isSameStartMonthYear && isSameEndMonthYear) || incomingOrg.title === existingOrg.jobTitle)) ) }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
services/apps/members_enrichment_worker/src/activities/enrichment.ts
(18 hunks)
🔇 Additional comments (3)
services/apps/members_enrichment_worker/src/activities/enrichment.ts (3)
Line range hint 3-46
: LGTM: Import statements are properly organized.
The new imports are correctly added and are used throughout the code.
381-387
:
Improve error handling in organization sync operations.
The current error handling has several issues:
- Uses
console.error
instead of the service logger - Silently catches errors without proper error propagation
Apply this fix:
orgIdsToSync.map((orgId) =>
- syncOrganization(orgId).catch((error) => {
- console.error(`Failed to sync organization with ID ${orgId}:`, error)
- }),
+ syncOrganization(orgId).catch((error) => {
+ svc.log.error({ error, orgId }, 'Failed to sync organization')
+ throw error
+ }),
Likely invalid or redundant comment.
953-966
:
Add error handling and environment validation for sync functions.
The sync functions need:
- Error handling for API calls
- Validation of environment variables
Add error handling:
export async function syncMember(memberId: string): Promise<void> {
+ const apiUrl = process.env['CROWD_SEARCH_SYNC_API_URL']
+ if (!apiUrl) {
+ throw new Error('CROWD_SEARCH_SYNC_API_URL environment variable is not set')
+ }
+
const syncApi = new SearchSyncApiClient({
- baseUrl: process.env['CROWD_SEARCH_SYNC_API_URL'],
+ baseUrl: apiUrl,
})
- await syncApi.triggerMemberSync(memberId, { withAggs: false })
+ try {
+ await syncApi.triggerMemberSync(memberId, { withAggs: false })
+ } catch (error) {
+ svc.log.error({ error, memberId }, 'Failed to sync member')
+ throw error
+ }
}
export async function syncOrganization(organizationId: string): Promise<void> {
+ const apiUrl = process.env['CROWD_SEARCH_SYNC_API_URL']
+ if (!apiUrl) {
+ throw new Error('CROWD_SEARCH_SYNC_API_URL environment variable is not set')
+ }
+
const syncApi = new SearchSyncApiClient({
- baseUrl: process.env['CROWD_SEARCH_SYNC_API_URL'],
+ baseUrl: apiUrl,
})
- await syncApi.triggerOrganizationSync(organizationId, undefined, { withAggs: false })
+ try {
+ await syncApi.triggerOrganizationSync(organizationId, undefined, { withAggs: false })
+ } catch (error) {
+ svc.log.error({ error, organizationId }, 'Failed to sync organization')
+ throw error
+ }
}
Likely invalid or redundant comment.
Changes proposed ✍️
What
copilot:summary
copilot:poem
Why
How
copilot:walkthrough
Checklist ✅
Feature
,Improvement
, orBug
.Summary by CodeRabbit
Release Notes
New Features
CLAUDE_3_5_SONNET_V2
.Improvements
Bug Fixes
Chores