Skip to content

Conversation

epipav
Copy link
Collaborator

@epipav epipav commented Dec 2, 2024

Changes proposed ✍️

What

copilot:summary

copilot:poem

Why

How

copilot:walkthrough

Checklist ✅

  • Label appropriately with Feature, Improvement, or Bug.
  • Add screenshots to the PR description for relevant FE changes
  • New backend functionality has been unit-tested.
  • API documentation has been updated (if necessary) (see docs on API documentation).
  • Quality standards are met.

Summary by CodeRabbit

Release Notes

  • New Features

    • Enhanced member enrichment processing with unlimited member handling.
    • Introduced a new LinkedIn profile extraction method for improved data normalization.
    • Added support for a new LLM model variant, CLAUDE_3_5_SONNET_V2.
    • Added a credit check mechanism before data enrichment to optimize processing.
    • Introduced a function for checking intersections between arrays for improved data handling.
  • Improvements

    • Streamlined data sanitization processes across multiple services.
    • Refined logic for handling member contributions and identities.
    • Improved decision-making process for profile selection by checking unverified identities.
    • Updated logging levels to reduce verbosity during operations.
  • Bug Fixes

    • Adjusted logging levels to improve clarity and reduce verbosity during operations.
  • Chores

    • Updated SQL queries to filter out bot members, ensuring only valid entries are processed.

@epipav epipav requested a review from themarolt December 2, 2024 08:40
Copy link

coderabbitai bot commented Dec 2, 2024

Walkthrough

The pull request introduces significant modifications across multiple files in the member enrichment worker service. Key changes include the addition of credit checks in data fetching, enhancements to logging levels, and improved processing logic for member data. New functions for handling string sanitization are added, and the handling of member organizations is refined. Additionally, a new model variant is introduced in the LLM settings. Overall, the updates aim to streamline member enrichment processes and improve data integrity.

Changes

File Change Summary
services/apps/members_enrichment_worker/src/activities/enrichment.ts - Updated getEnrichmentData, updateMemberUsingSquashedPayload, and added syncMember, syncOrganization functions for simplified logic and debug logging.
- Refined processing of identities and contributions.
services/apps/members_enrichment_worker/src/bin/onboarding.ts - Changed maxMembersToProcess from 1000 to Infinity to allow unlimited member processing.
- Updated service configuration settings.
services/apps/members_enrichment_worker/src/sources/clearbit/service.ts - Added replaceDoubleQuotes function for sanitizing employment data in normalizeEmployment.
services/apps/members_enrichment_worker/src/sources/crustdata/service.ts - Added replaceDoubleQuotes for cleaning employment fields and refined email handling in normalizeIdentities.
services/apps/members_enrichment_worker/src/sources/progai/service.ts - Updated normalizeEmployment to refine processing logic and added replaceDoubleQuotes for company fields.
services/apps/members_enrichment_worker/src/sources/serp/service.ts - Introduced getLinkedInProfileHandle method for improved URL handling.
services/apps/members_enrichment_worker/src/workflows/enrichMember.ts - Added credit check using hasRemainingCredits before enrichment data fetching.
services/apps/members_enrichment_worker/src/workflows/processMemberSources.ts - Enhanced identity verification and data sanitization in member enrichment processing.
services/libs/common/src/utils.ts - Updated replaceDoubleQuotes to handle a broader range of double quote characters.
services/libs/common_services/src/services/llm.service.ts - Changed logging level for estimated LLM cost from info to debug.
services/libs/data-access-layer/src/old/apps/members_enrichment_worker/index.ts - Reformatted imports and added SQL condition to filter out bot members in fetchMembersForEnrichment.
services/libs/data-access-layer/src/organizations/organizations.ts - Changed logging level from log.warn to log.debug in findOrCreateOrganization function.
services/libs/types/src/enums/llm.ts - Added new enum value CLAUDE_3_5_SONNET_V2.
services/libs/types/src/llm.ts - Updated model settings to include CLAUDE_3_5_SONNET_V2 and adjusted existing entries accordingly.
services/apps/members_enrichment_worker/src/activities.ts - Added hasRemainingCredits function for credit management.
services/libs/common/src/array.ts - Introduced hasIntersection function to check for common elements in two arrays.
services/libs/types/src/enrichment.ts - Updated IMemberOrganizationData interface to include an optional identities property.

Possibly related PRs

Suggested labels

Improvement, Feature

🐰 "In the fields where data grows,
We enrich the members, as everyone knows.
With quotes replaced and credits checked,
Our logic is clear, our paths perfect.
So hop along, let’s process with glee,
For a brighter tomorrow, just wait and see!" 🐇


🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@epipav epipav marked this pull request as ready for review December 2, 2024 09:06
….com:CrowdDotDev/crowd.dev into bugfix/check-org-existence-in-enrichment-fix
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (7)
services/apps/members_enrichment_worker/src/sources/progai/service.ts (2)

277-280: Use URL Parsing for More Reliable GitHub Link Detection

Instead of using includes('github') to check if companyUrl is a GitHub link, consider parsing the URL to extract the hostname and compare it directly to 'github.com'. This will handle edge cases and make the check more robust.


294-295: Use URL Parsing to Extract LinkedIn Company Identifier

Using split('/').pop() may not reliably extract the company identifier from companyLinkedInUrl, especially if the URL has trailing slashes or query parameters. Consider using a URL parser to extract the path segments accurately.

services/apps/members_enrichment_worker/src/activities/enrichment.ts (1)

610-610: Typographical Error in Variable Name

The variable profilesFromUnverfiedIdentities has a typo. It should be profilesFromUnverifiedIdentities.

Apply this diff to correct the typo:

- const profilesFromUnverfiedIdentities: IMemberEnrichmentDataNormalized[] = []
+ const profilesFromUnverifiedIdentities: IMemberEnrichmentDataNormalized[] = []

Also, update all occurrences of this variable throughout the code.

services/apps/members_enrichment_worker/src/workflows/enrichMember.ts (1)

50-54: Consider adding logging for credit exhaustion

While the credit check logic is sound, it would be helpful to log when enrichment is skipped due to insufficient credits. This would aid in monitoring and debugging.

 if (!(await hasRemainingCredits(source))) {
+  this.log.info(`Skipping enrichment for source ${source} due to insufficient credits`, {
+    memberId: input.id,
+    source,
+  });
   await touchMemberEnrichmentCacheUpdatedAt(source, input.id)
   continue
 }
services/apps/members_enrichment_worker/src/sources/serp/service.ts (1)

138-142: Consider enhancing LinkedIn handle extraction

While the regex pattern is correct for standard LinkedIn profile URLs, consider handling additional edge cases:

  1. URLs with tracking parameters after the handle
  2. Language-specific LinkedIn domains
  3. Handles containing special characters
 private getLinkedInProfileHandle(url: string): string | null {
-  const regex = /in\/([^/]+)/
+  const regex = /in\/([^/?#]+)/
   const match = url.match(regex)
-  return match ? match[1] : null
+  return match ? decodeURIComponent(match[1]) : null
 }
services/apps/members_enrichment_worker/src/sources/crustdata/service.ts (1)

355-363: Consider extracting the sanitization logic into a helper function.

While the implementation is correct, the repeated use of replaceDoubleQuotes could be simplified by extracting it into a helper function.

Consider this refactor:

+ private sanitizeEmploymentFields(fields: {
+   name?: string;
+   title?: string;
+   description?: string;
+ }) {
+   return {
+     name: fields.name ? replaceDoubleQuotes(fields.name) : null,
+     title: fields.title ? replaceDoubleQuotes(fields.title) : null,
+     description: fields.description ? replaceDoubleQuotes(fields.description) : null,
+   };
+ }

  private normalizeEmployment(
    data: IMemberEnrichmentDataCrustdata,
    normalized: IMemberEnrichmentDataNormalized,
  ): IMemberEnrichmentDataNormalized {
    // ... existing code ...

    if (employmentInformation.length > 0) {
      for (const workExperience of employmentInformation) {
        const identities = []
        // ... existing identity code ...

+       const sanitizedFields = this.sanitizeEmploymentFields({
+         name: workExperience.employer_name,
+         title: workExperience.employee_title,
+         description: workExperience.employer_linkedin_description,
+       });

        normalized.memberOrganizations.push({
-         name: replaceDoubleQuotes(workExperience.employer_name),
+         name: sanitizedFields.name,
          source: OrganizationSource.ENRICHMENT_CRUSTDATA,
          identities,
-         title: replaceDoubleQuotes(workExperience.employee_title),
+         title: sanitizedFields.title,
          startDate: workExperience?.start_date ?? null,
          endDate: workExperience?.end_date ?? null,
-         organizationDescription: replaceDoubleQuotes(
-           workExperience.employer_linkedin_description,
-         ),
+         organizationDescription: sanitizedFields.description,
        })
      }
    }
services/libs/data-access-layer/src/old/apps/members_enrichment_worker/index.ts (1)

130-130: LGTM: Added bot filtering for member enrichment

The addition of the bot filtering condition is a good optimization that prevents unnecessary processing of bot accounts. The implementation safely handles null values by defaulting to false.

Consider adding an index on the (attributes->>'isBot')::boolean expression if you expect a significant number of bot accounts, as this would improve the query performance.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 7f47984 and 3b5c48a.

📒 Files selected for processing (14)
  • services/apps/members_enrichment_worker/src/activities/enrichment.ts (8 hunks)
  • services/apps/members_enrichment_worker/src/bin/onboarding.ts (1 hunks)
  • services/apps/members_enrichment_worker/src/sources/clearbit/service.ts (2 hunks)
  • services/apps/members_enrichment_worker/src/sources/crustdata/service.ts (2 hunks)
  • services/apps/members_enrichment_worker/src/sources/progai/service.ts (2 hunks)
  • services/apps/members_enrichment_worker/src/sources/serp/service.ts (1 hunks)
  • services/apps/members_enrichment_worker/src/workflows/enrichMember.ts (2 hunks)
  • services/apps/members_enrichment_worker/src/workflows/processMemberSources.ts (3 hunks)
  • services/libs/common/src/utils.ts (1 hunks)
  • services/libs/common_services/src/services/llm.service.ts (1 hunks)
  • services/libs/data-access-layer/src/old/apps/members_enrichment_worker/index.ts (3 hunks)
  • services/libs/data-access-layer/src/organizations/organizations.ts (1 hunks)
  • services/libs/types/src/enums/llm.ts (1 hunks)
  • services/libs/types/src/llm.ts (3 hunks)
✅ Files skipped from review due to trivial changes (1)
  • services/libs/common_services/src/services/llm.service.ts
🧰 Additional context used
🪛 GitHub Check: lint-format-services
services/libs/data-access-layer/src/old/apps/members_enrichment_worker/index.ts

[failure] 1-1:
Delete ;


[failure] 2-2:
Delete ;


[failure] 3-3:
Replace ·IAttributes,·IEnrichableMember,·IMemberEnrichmentCache,·IMemberEnrichmentSourceQueryInput,·IMemberIdentity,·IMemberOrganizationData,·IMemberOriginalData,·IOrganizationIdentity,·MemberEnrichmentSource,·MemberIdentityType,·OrganizationSource·}·from·'@crowd/types';⏎ with ⏎··IAttributes,⏎··IEnrichableMember,⏎··IMemberEnrichmentCache,⏎··IMemberEnrichmentSourceQueryInput,⏎··IMemberIdentity,⏎··IMemberOrganizationData,⏎··IMemberOriginalData,⏎··IOrganizationIdentity,⏎··MemberEnrichmentSource,⏎··MemberIdentityType,⏎··OrganizationSource,⏎}·from·'@crowd/types'


[failure] 704-704:
Insert

🔇 Additional comments (12)
services/apps/members_enrichment_worker/src/workflows/processMemberSources.ts (2)

236-236: Data Sanitization Applied to Squashed Attributes

Good job applying cleanAttributeValue to ensure the squashed attributes are sanitized before being added to the payload. This helps maintain data integrity.


268-283: Correct Implementation to Limit Verified Identities

The logic correctly ensures that only one verified identity is retained in each work experience by marking additional verified identities as unverified. This prevents potential database conflicts with duplicate verified identities.

services/apps/members_enrichment_worker/src/activities/enrichment.ts (1)

68-69: Simplified Control Flow in getEnrichmentData Function

Removing the credit check from getEnrichmentData simplifies the function's control flow. Ensure that credit checks are handled appropriately elsewhere to manage resource usage.

services/libs/types/src/enums/llm.ts (1)

3-3: New LLM Model Type Added

The addition of CLAUDE_3_5_SONNET_V2 to the LlmModelType enum is correctly implemented. This extends the available models for LLM queries.

services/libs/common/src/utils.ts (1)

76-76: Improved Double Quote Replacement

The replaceDoubleQuotes function now handles various Unicode representations of double quotes, enhancing data sanitization across different character sets.

services/libs/types/src/llm.ts (2)

46-49: Confirm pricing accuracy for new model

The pricing configuration for CLAUDE_3_5_SONNET_V2 matches CLAUDE_3_5_SONNET. Please verify these values are accurate according to the provider's documentation.


62-62: Verify impact of model change on enrichment quality

Multiple enrichment queries have been updated to use CLAUDE_3_5_SONNET_V2. While the model configurations are consistent, we should verify that the new model maintains or improves the quality of enrichment results.

Also applies to: 70-70, 78-78

services/apps/members_enrichment_worker/src/sources/clearbit/service.ts (2)

3-3: LGTM: Import statement added correctly.

The replaceDoubleQuotes utility function is imported from the common package.


220-223: LGTM: Employment data sanitization implemented correctly.

The replaceDoubleQuotes function is appropriately applied to both the organization name and title fields.

services/apps/members_enrichment_worker/src/bin/onboarding.ts (1)

30-30: Verify resource implications of unlimited member processing.

Changing maxMembersToProcess to Infinity removes the upper bound on member processing. While this allows for complete processing of all eligible members, consider:

  1. Memory usage over extended runs
  2. Database connection pool limits
  3. API rate limits for enrichment services

The maxConcurrentProcessing limit of 5 provides some protection, but monitoring system resources during extended runs is advised.

✅ Verification successful

Let me gather more specific information about memory management and concurrency controls in this worker.


Let me check the member processing implementation to understand the batching and database interaction patterns.


Based on the gathered information, I can now provide a final response about the resource implications:

Setting maxMembersToProcess to Infinity is acceptable with current safeguards

The implementation has several built-in protections against resource exhaustion:

  1. Concurrent processing is limited to 5 members at a time via maxConcurrentProcessing = 5
  2. Members are processed in pages with controlled database queries that filter for specific criteria (>1 enrichment source, >100 activities)
  3. Each member workflow has a 15-minute timeout and retry limits
  4. Database load is naturally regulated through the temporal workflow queue

The Infinity setting simply allows the worker to continue processing eligible members without an arbitrary cap, while the above controls prevent resource overload.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for any rate limiting configurations in the codebase
rg -A 3 'rate.*limit|throttle|maxRequest|concurrent'

# Look for memory management configurations
rg -A 3 'memory|heap|cache'

Length of output: 66264


Script:

#!/bin/bash
# Look for worker configuration and concurrency settings
rg -A 3 "maxConcurrentProcessing|batch.*size|chunk.*size|queue.*config" services/apps/members_enrichment_worker/

# Look for memory management in the worker
rg -A 3 "memory|heap" services/apps/members_enrichment_worker/

Length of output: 917


Script:

#!/bin/bash
# Look for database operations and member processing logic
rg -A 10 "getEnrichableMembers|processMember" services/apps/members_enrichment_worker/src/bin/onboarding.ts

# Check for any database connection pool configurations
rg -A 3 "pool.*size|connection.*pool" services/libs/data-access-layer/

Length of output: 2625

services/apps/members_enrichment_worker/src/sources/crustdata/service.ts (1)

3-3: LGTM: Import statement added correctly.

The replaceDoubleQuotes utility function is imported alongside the existing isEmail import.

services/libs/data-access-layer/src/organizations/organizations.ts (1)

516-519: LGTM: Appropriate logging level adjustment

The change from warn to debug level is appropriate here as this is an expected scenario in the normal flow, not a warning condition.

type: MemberIdentityType.USERNAME,
verified: false,
value: this.normalizeLinkedUrl(data.linkedinUrl).split('/').pop(),
value: this.getLinkedInProfileHandle(this.normalizeLinkedUrl(data.linkedinUrl)),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling for handle extraction

The normalize method should handle cases where getLinkedInProfileHandle returns null.

-  value: this.getLinkedInProfileHandle(this.normalizeLinkedUrl(data.linkedinUrl)),
+  value: this.getLinkedInProfileHandle(this.normalizeLinkedUrl(data.linkedinUrl)) ?? 
+    this.log.warn('Failed to extract LinkedIn handle', { url: data.linkedinUrl }) || null,

Committable suggestion skipped: line range outside the PR's diff.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
services/apps/members_enrichment_worker/src/workflows/processMemberSources.ts (1)

268-282: Consider extracting the identity verification logic into a separate function

While the logic for handling verified identities is correct, it could be more maintainable if extracted into a dedicated function. This would improve readability and make the code more testable.

Consider refactoring to:

+ function ensureSingleVerifiedIdentity(identities: any[]): any[] {
+   let found = false;
+   return (identities || []).map((i) => {
+     if (i.verified && !found) {
+       found = true;
+       return i;
+     } else if (i.verified) {
+       return { ...i, verified: false };
+     }
+     return i;
+   });
+ }

  workExperiencesSquashedByLLM.forEach((we) => {
-   let found = false
-   we.identities = (we.identities || []).map((i) => {
-     if (i.verified && !found) {
-       found = true;
-       return i;
-     } else if (i.verified) {
-       return { ...i, verified: false };
-     }
-     return i;
-   });
+   we.identities = ensureSingleVerifiedIdentity(we.identities);
  });

This refactoring:

  1. Improves code organization and reusability
  2. Makes the logic easier to test in isolation
  3. Maintains the same functionality while being more maintainable
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 9fac18e and 64aee6c.

📒 Files selected for processing (2)
  • services/apps/members_enrichment_worker/src/activities/enrichment.ts (8 hunks)
  • services/apps/members_enrichment_worker/src/workflows/processMemberSources.ts (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • services/apps/members_enrichment_worker/src/activities/enrichment.ts
🔇 Additional comments (1)
services/apps/members_enrichment_worker/src/workflows/processMemberSources.ts (1)

236-236: LGTM: Consistent attribute value cleaning

The addition of cleanAttributeValue for LLM-squashed attributes maintains data consistency with individually processed attributes.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (2)
services/apps/members_enrichment_worker/src/sources/progai/service.ts (2)

275-289: Consider making domain verification logic more explicit

The GitHub URL exclusion logic is good, but consider extracting it into a separate method for better maintainability and reusability.

+ private isValidCompanyDomain(url: string, company: string): boolean {
+   const isGithubUrl = url.toLowerCase().includes('github')
+   const isGithubCompany = (company || '').toLowerCase().includes('github')
+   return !isGithubUrl && !isGithubCompany
+ }

  if (workExperience.companyUrl) {
    const normalizedDomain = websiteNormalizer(workExperience.companyUrl, false)
-   if (
-     normalizedDomain &&
-     !workExperience.companyUrl.toLowerCase().includes('github') &&
-     !(workExperience.company || '').toLowerCase().includes('github')
-   ) {
+   if (normalizedDomain && this.isValidCompanyDomain(workExperience.companyUrl, workExperience.company)) {

320-336: Consider enhancing LinkedIn URL handling

While the current implementation handles basic company and school URLs well, consider:

  1. Adding input validation
  2. Supporting more LinkedIn URL formats (e.g., with query parameters)
 private getLinkedInProfileHandle(url: string): string | null {
+  if (!url || typeof url !== 'string') {
+    return null;
+  }
+  
+  // Remove query parameters and trailing slashes
+  const cleanUrl = url.split('?')[0].replace(/\/+$/, '');
+
   let regex = /company\/([^/]+)/
-  let match = url.match(regex)
+  let match = cleanUrl.match(regex)

   if (match) {
     return `company:${match[1]}`
   }

   regex = /school\/([^/]+)/
-  match = url.match(regex)
+  match = cleanUrl.match(regex)

   if (match) {
     return `school:${match[1]}`
   }

   return null
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 64aee6c and a859861.

📒 Files selected for processing (1)
  • services/apps/members_enrichment_worker/src/sources/progai/service.ts (2 hunks)
🔇 Additional comments (2)
services/apps/members_enrichment_worker/src/sources/progai/service.ts (2)

4-4: LGTM: Good addition of string sanitization utility

The addition of replaceDoubleQuotes from @crowd/common is a good practice for sanitizing company names and titles.


264-268: Good improvement in null handling

The enhanced conditions ensure we only process work experiences with valid company information, preventing potential issues with null values.

Comment on lines +308 to +311
startDate: workExperience.startDate
? workExperience.startDate.replace('Z', '+00:00')
: null,
endDate: workExperience.endDate ? workExperience.endDate.replace('Z', '+00:00') : null,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Based on the code analysis, I can now provide the final response:

Add date format validation for work experience dates

The code assumes dates from the ProgAI API are in a format ending with 'Z' (UTC timezone marker), but there's no validation to ensure this. The type definition shows that startDate and endDate are just string types without format constraints. Consider:

  • Adding input validation to ensure dates match the expected format before manipulation
  • Using a date parsing library like date-fns or similar to handle the timezone conversion safely
  • Adding error handling for malformed dates
🔗 Analysis chain

Validate date string manipulation

The date string manipulation assumes a specific format. Consider adding validation or using a date manipulation library to handle different date formats safely.


Let me check the input date format and validation in the codebase.


Let me check if there's any date format validation or parsing before this transformation.


Let me check the entire file content to understand the context better.


Let me check the type definitions for the work experience data structure.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for potential date format issues in the codebase
rg -A 2 "replace\('Z', '\+00:00'\)" 

Length of output: 608


Script:

#!/bin/bash
# Check the workExperience type definition and any date validation
ast-grep --pattern 'interface $_WorkExperience {
  $$$
  startDate$_
  $$$
  endDate$_
  $$$
}'

# Check for any date validation or parsing logic
rg -B 2 -A 2 "workExperience\.(startDate|endDate)" 

Length of output: 1053


Script:

#!/bin/bash
# Check for date validation or parsing in the file
rg -B 5 -A 5 "function.*workExperience" services/apps/members_enrichment_worker/src/sources/progai/service.ts

# Check if there's any date validation utility being used
rg -l "import.*date|moment|luxon|dayjs" services/apps/members_enrichment_worker/src/sources/progai/service.ts

# Look for any date format validation
ast-grep --pattern 'function $_(workExperience$_) {
  $$$
}'

Length of output: 286


Script:

#!/bin/bash
# Get the content of the service file
cat services/apps/members_enrichment_worker/src/sources/progai/service.ts

# Check for any date-related types or interfaces
ast-grep --pattern 'type $_ = {
  $$$
  date$_
  $$$
}'

# Check for any date validation functions
rg -l "validateDate|isValidDate|parseDate" services/apps/members_enrichment_worker/src/sources/progai/service.ts

Length of output: 11440


Script:

#!/bin/bash
# Check the types file for work experience interface
cat services/apps/members_enrichment_worker/src/sources/progai/types.ts

# Check if there's any date format validation in the utils
rg -l "date" services/apps/members_enrichment_worker/src/utils/

# Look for any date-related imports in the types file
rg "import.*date" services/apps/members_enrichment_worker/src/sources/progai/types.ts

Length of output: 2202

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)
services/apps/members_enrichment_worker/src/workflows/processMemberSources.ts (2)

268-282: Consider enhancing identity selection logic

While the logic to prevent multiple verified identities is sound, consider:

  1. Adding logging when identities are marked as unverified
  2. Implementing a more deterministic selection criteria (e.g., based on timestamp or data quality)
 workExperiencesSquashedByLLM.forEach((we) => {
   let found = false
   we.identities = (we.identities || []).map((i) => {
     if (i.verified && !found) {
       found = true
       return i
     } else if (i.verified) {
+      console.log(`Marking identity as unverified for organization to prevent merge issues: ${JSON.stringify(i)}`)
       return { ...i, verified: false }
     }
     return i
   })
 })

Line range hint 1-307: Consider architectural improvements

While the code is functional, consider the following improvements:

  1. Break down the large processMemberSources function into smaller, focused functions for better maintainability:
    • squashIdentities
    • squashAttributes
    • squashWorkExperiences
  2. Add performance monitoring for LLM operations
  3. Enhance error handling for edge cases

This would improve maintainability and make the code easier to test.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between a859861 and 5e2fcb0.

📒 Files selected for processing (4)
  • services/apps/members_enrichment_worker/src/activities.ts (2 hunks)
  • services/apps/members_enrichment_worker/src/activities/enrichment.ts (13 hunks)
  • services/apps/members_enrichment_worker/src/workflows/enrichMember.ts (2 hunks)
  • services/apps/members_enrichment_worker/src/workflows/processMemberSources.ts (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • services/apps/members_enrichment_worker/src/workflows/enrichMember.ts
  • services/apps/members_enrichment_worker/src/activities/enrichment.ts
🔇 Additional comments (5)
services/apps/members_enrichment_worker/src/activities.ts (2)

11-11: LGTM! Import added in correct alphabetical order.

The new import follows the existing pattern and maintains alphabetical ordering.


79-79: Consider maintaining alphabetical order in exports and verify credit check implementation.

While the export addition is functionally correct, consider maintaining alphabetical ordering in the exports list for better maintainability. Additionally, since this function is critical for managing enrichment credits, ensure it's properly integrated across all enrichment workflows.

Let's verify the credit check implementation:

✅ Verification successful

Credit check implementation is properly integrated and working as expected.

The verification shows that:

  • The hasRemainingCredits function is properly defined in the interface (types.ts) and implemented by all enrichment sources
  • It's correctly used in the enrichment workflow to check credits before proceeding with enrichment
  • The implementation includes proper caching mechanism in Redis to avoid frequent API calls
  • Each source has its own implementation:
    • SERP API checks account endpoint
    • CrustData checks credits endpoint
    • Others have placeholder implementations returning true

The only minor suggestion about alphabetical ordering in exports remains valid, but the credit check implementation is robust and properly integrated.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the credit check implementation and its usage

# Check where hasRemainingCredits is defined and how it's implemented
ast-grep --pattern 'function hasRemainingCredits($_) {
  $$$
}'

# Check where hasRemainingCredits is used in enrichment workflows
rg -A 5 'hasRemainingCredits'

Length of output: 8838

services/apps/members_enrichment_worker/src/workflows/processMemberSources.ts (3)

236-236: LGTM: Consistent attribute sanitization

Good addition of sanitization for LLM-processed attributes, maintaining data consistency with other attribute processing.


268-282: Verify impact on existing organizations

The new identity verification logic could affect organization merging behavior.


297-297: Verify CrustData profile selection impact

The addition of the CrustData profile selection flag affects member updates.

✅ Verification successful

CrustData profile selection flag is correctly implemented

The flag isHighConfidenceSourceSelectedForWorkExperiences (passed as !!crustDataProfileSelected) is used to determine how work experiences should be handled during member updates. When CrustData is selected as the source:

  • All existing work experiences are deleted (including manually created ones)
  • All new work experiences from CrustData are created
  • This indicates high confidence in CrustData's accuracy for work experience data

This implementation is intentional and aligns with the workflow's purpose of preferring CrustData's work experience data when available.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify how the CrustData profile selection flag affects member updates

# Search for the usage of this flag in the update logic
ast-grep --pattern 'updateMemberUsingSquashedPayload.*crustData'

# Look for related CrustData processing logic
rg -A 5 'CrustData.*profile|profile.*CrustData'

Length of output: 116


Script:

#!/bin/bash
# Let's try a broader search to understand the context better

# Search for updateMemberUsingSquashedPayload function definition and usage
ast-grep --pattern 'function updateMemberUsingSquashedPayload'

# Search for CrustData related code
rg -l "CrustData"

# Search for the specific file content to understand the context
cat services/apps/members_enrichment_worker/src/workflows/processMemberSources.ts

Length of output: 10651


Script:

#!/bin/bash
# Let's look for the activity implementation to understand how the flag is used

# Search for updateMemberUsingSquashedPayload implementation
fd -e ts updateMemberUsingSquashedPayload --exec cat {}

# Search for any related configuration or constants
rg -A 5 "updateMemberUsingSquashedPayload.*crustData" services/apps/members_enrichment_worker/src/activities/

Length of output: 171


Script:

#!/bin/bash
# Let's try a different approach to locate the activity implementation

# Look for the activities directory content
ls -R services/apps/members_enrichment_worker/src/activities/

# Try to find the enrichment activity file
cat services/apps/members_enrichment_worker/src/activities/enrichment.ts

Length of output: 33974

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (1)
services/apps/members_enrichment_worker/src/activities/enrichment.ts (1)

233-233: Consider keeping important state changes at INFO level.

While reducing log verbosity is good, consider keeping important state changes at INFO level for better operational visibility. For example, member attribute updates and reach changes are significant state modifications that operations might want to monitor without enabling DEBUG logging.

Also applies to: 252-252, 284-284, 302-302

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 5e2fcb0 and 22121ea.

📒 Files selected for processing (1)
  • services/apps/members_enrichment_worker/src/activities/enrichment.ts (17 hunks)

Comment on lines +908 to +921
export async function syncMember(memberId: string): Promise<void> {
const syncApi = new SearchSyncApiClient({
baseUrl: process.env['CROWD_SEARCH_SYNC_API_URL'],
})

// Convert strings to timestamps, using fallbacks for missing dates
const start1 = d1Start ? new Date(d1Start).getTime() : -Infinity
const end1 = d1End ? new Date(d1End).getTime() : Infinity
const start2 = d2Start ? new Date(d2Start).getTime() : -Infinity
const end2 = d2End ? new Date(d2End).getTime() : Infinity
await syncApi.triggerMemberSync(memberId, { withAggs: false })
}

export async function syncOrganization(organizationId: string): Promise<void> {
const syncApi = new SearchSyncApiClient({
baseUrl: process.env['CROWD_SEARCH_SYNC_API_URL'],
})

// Periods intersect if one period's start is before other period's end
// and that same period's end is after the other period's start
return start1 <= end2 && end1 >= start2
await syncApi.triggerOrganizationSync(organizationId, undefined, { withAggs: false })
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add error handling and environment validation for sync functions.

The sync functions need:

  1. Error handling for API calls
  2. Validation of environment variables
  3. Retry logic for transient failures

Add error handling:

 export async function syncMember(memberId: string): Promise<void> {
+  const apiUrl = process.env['CROWD_SEARCH_SYNC_API_URL']
+  if (!apiUrl) {
+    throw new Error('CROWD_SEARCH_SYNC_API_URL environment variable is not set')
+  }
+
   const syncApi = new SearchSyncApiClient({
-    baseUrl: process.env['CROWD_SEARCH_SYNC_API_URL'],
+    baseUrl: apiUrl,
   })

-  await syncApi.triggerMemberSync(memberId, { withAggs: false })
+  try {
+    await syncApi.triggerMemberSync(memberId, { withAggs: false })
+  } catch (error) {
+    svc.log.error({ error, memberId }, 'Failed to sync member')
+    throw error
+  }
 }

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +815 to +832
isHighConfidenceSourceSelectedForWorkExperiences: boolean,
): IWorkExperienceChanges {
// we delete all the work experiences that were not manually created
const toDelete = oldVersion.filter((c) => c.source !== OrganizationSource.UI)
let toDelete = oldVersion.filter((c) => c.source !== OrganizationSource.UI)

const toCreate: IMemberEnrichmentDataNormalizedOrganization[] = []
// eslint-disable-next-line @typescript-eslint/no-explicit-any
const toUpdate: Map<IMemberOrganizationData, Record<string, any>> = new Map()

if (isHighConfidenceSourceSelectedForWorkExperiences) {
toDelete = oldVersion
toCreate.push(...newVersion)
return {
toDelete,
toCreate,
toUpdate,
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Document the high confidence override behavior and add safeguards.

The isHighConfidenceSourceSelectedForWorkExperiences parameter allows overriding all existing work experiences, including manually created ones. This is a critical change that needs:

  1. Documentation explaining when this override should be used
  2. Safeguards to prevent accidental data loss
  3. Logging of override events for audit purposes

Add documentation and logging:

 function prepareWorkExperiences(
   oldVersion: IMemberOrganizationData[],
   newVersion: IMemberEnrichmentDataNormalizedOrganization[],
   isHighConfidenceSourceSelectedForWorkExperiences: boolean,
 ): IWorkExperienceChanges {
+  // When isHighConfidenceSourceSelectedForWorkExperiences is true, all existing work experiences
+  // will be replaced with the new version, including manually created ones. This should only be
+  // used when the source data is highly reliable and complete.
+  svc.log.info(
+    { isHighConfidenceSourceSelectedForWorkExperiences },
+    'Processing work experiences with override mode'
+  )

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +865 to +893
const match = orderedNewVersion.find(
(e) =>
e.title === current.jobTitle &&
e.identities &&
e.identities.some((e) => e.organizationId === current.orgId),
)
if (!match) {
// if we didn't find a match by title we should check dates
match = orderedNewVersion.find(
(e) =>
dateIntersects(current.dateStart, current.dateEnd, e.startDate, e.endDate) &&
e.identities &&
e.identities.some((e) => e.organizationId === current.orgId),
)
}

// if we found a match we can check if we need something to update
if (match) {
// eslint-disable-next-line @typescript-eslint/no-explicit-any
if (
match &&
current.dateStart === match.startDate &&
current.dateEnd === null &&
match.endDate !== null
) {
const toUpdateInner: Record<string, any> = {}

// lets check if the dates and title are the same otherwise we need to update them
if (current.dateStart !== match.startDate) {
toUpdateInner.dateStart = match.startDate
}

if (current.dateEnd !== match.endDate) {
toUpdateInner.dateEnd = match.endDate
}

if (current.jobTitle !== match.title) {
toUpdateInner.title = match.title
}

if (Object.keys(toUpdateInner).length > 0) {
toUpdate.set(current, toUpdateInner)
}
toUpdateInner.dateEnd = match.endDate
toUpdate.set(current, toUpdateInner)

// remove the match from the new version array so we later don't process it again
orderedNewVersion = orderedNewVersion.filter((e) => e.id !== match.id)
} else if (
match &&
(current.dateStart !== match.startDate || current.dateEnd !== null || match.endDate === null)
) {
// there's an incoming work experiences, but it's conflicting with the existing manually updated data
// we shouldn't add or update anything when this happens
// we can only update dateEnd of existing manually changed data, when it has a null dateEnd
orderedNewVersion = orderedNewVersion.filter((e) => e.id !== match.id)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve error handling for work experience matching.

The work experience matching logic needs better error handling:

  1. Validation of input data structures
  2. Handling of edge cases where identities array is undefined
  3. Logging of matching decisions for debugging

Add error handling:

     const match = orderedNewVersion.find(
       (e) =>
         e.title === current.jobTitle &&
-        e.identities &&
-        e.identities.some((e) => e.organizationId === current.orgId),
+        Array.isArray(e.identities) &&
+        e.identities.some((identity) => {
+          if (!identity || !identity.organizationId) {
+            svc.log.warn({ experience: e }, 'Invalid identity data in work experience')
+            return false
+          }
+          return identity.organizationId === current.orgId
+        }),
     )

+    svc.log.debug(
+      { current, match },
+      match ? 'Found matching work experience' : 'No matching work experience found'
+    )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const match = orderedNewVersion.find(
(e) =>
e.title === current.jobTitle &&
e.identities &&
e.identities.some((e) => e.organizationId === current.orgId),
)
if (!match) {
// if we didn't find a match by title we should check dates
match = orderedNewVersion.find(
(e) =>
dateIntersects(current.dateStart, current.dateEnd, e.startDate, e.endDate) &&
e.identities &&
e.identities.some((e) => e.organizationId === current.orgId),
)
}
// if we found a match we can check if we need something to update
if (match) {
// eslint-disable-next-line @typescript-eslint/no-explicit-any
if (
match &&
current.dateStart === match.startDate &&
current.dateEnd === null &&
match.endDate !== null
) {
const toUpdateInner: Record<string, any> = {}
// lets check if the dates and title are the same otherwise we need to update them
if (current.dateStart !== match.startDate) {
toUpdateInner.dateStart = match.startDate
}
if (current.dateEnd !== match.endDate) {
toUpdateInner.dateEnd = match.endDate
}
if (current.jobTitle !== match.title) {
toUpdateInner.title = match.title
}
if (Object.keys(toUpdateInner).length > 0) {
toUpdate.set(current, toUpdateInner)
}
toUpdateInner.dateEnd = match.endDate
toUpdate.set(current, toUpdateInner)
// remove the match from the new version array so we later don't process it again
orderedNewVersion = orderedNewVersion.filter((e) => e.id !== match.id)
} else if (
match &&
(current.dateStart !== match.startDate || current.dateEnd !== null || match.endDate === null)
) {
// there's an incoming work experiences, but it's conflicting with the existing manually updated data
// we shouldn't add or update anything when this happens
// we can only update dateEnd of existing manually changed data, when it has a null dateEnd
orderedNewVersion = orderedNewVersion.filter((e) => e.id !== match.id)
const match = orderedNewVersion.find(
(e) =>
e.title === current.jobTitle &&
Array.isArray(e.identities) &&
e.identities.some((identity) => {
if (!identity || !identity.organizationId) {
svc.log.warn({ experience: e }, 'Invalid identity data in work experience')
return false
}
return identity.organizationId === current.orgId
}),
)
svc.log.debug(
{ current, match },
match ? 'Found matching work experience' : 'No matching work experience found'
)
// if we found a match we can check if we need something to update
if (
match &&
current.dateStart === match.startDate &&
current.dateEnd === null &&
match.endDate !== null
) {
const toUpdateInner: Record<string, any> = {}
toUpdateInner.dateEnd = match.endDate
toUpdate.set(current, toUpdateInner)
// remove the match from the new version array so we later don't process it again
orderedNewVersion = orderedNewVersion.filter((e) => e.id !== match.id)
} else if (
match &&
(current.dateStart !== match.startDate || current.dateEnd !== null || match.endDate === null)
) {
// there's an incoming work experiences, but it's conflicting with the existing manually updated data
// we shouldn't add or update anything when this happens
// we can only update dateEnd of existing manually changed data, when it has a null dateEnd
orderedNewVersion = orderedNewVersion.filter((e) => e.id !== match.id)

Comment on lines 324 to 368
// try matching member's existing organizations with the new ones
// we'll be using displayName, title, dates
for (const org of squashedPayload.memberOrganizations) {
if (!org.organizationId) {
// Check if any similar in existing work experiences
const existingOrg = existingMemberData.organizations.find((o) => {
const incomingOrgStartDate = org.startDate ? new Date(org.startDate) : null
const incomingOrgEndDate = org.endDate ? new Date(org.endDate) : null
const existingOrgStartDate = o.dateStart ? new Date(o.dateStart) : null
const existingOrgEndEndDate = o.dateEnd ? new Date(o.dateEnd) : null

const isSameStartMonthYear =
(!incomingOrgStartDate && !existingOrgStartDate) || // Both start dates are null
(incomingOrgStartDate &&
existingOrgStartDate &&
incomingOrgStartDate.getMonth() === existingOrgStartDate.getMonth() &&
incomingOrgStartDate.getFullYear() === existingOrgStartDate.getFullYear())

const isSameEndMonthYear =
(!incomingOrgEndDate && !existingOrgEndEndDate) || // Both end dates are null
(incomingOrgEndDate &&
existingOrgEndEndDate &&
incomingOrgEndDate.getMonth() === existingOrgEndEndDate.getMonth() &&
incomingOrgEndDate.getFullYear() === existingOrgEndEndDate.getFullYear())

return (
(o.orgName.toLowerCase().includes(org.name.toLowerCase()) ||
org.name.toLowerCase().includes(o.orgName.toLowerCase())) &&
((isSameStartMonthYear && isSameEndMonthYear) || org.title === o.jobTitle)
)
})

if (existingOrg) {
// Get all orgs with the same name as the current one
const matchingOrgs = squashedPayload.memberOrganizations.filter(
(otherOrg) => otherOrg.name === org.name,
)

// Set organizationId for all matching orgs
for (const matchingOrg of matchingOrgs) {
matchingOrg.organizationId = existingOrg.orgId
}
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Refactor organization matching logic for better maintainability.

The organization matching logic is complex and could benefit from:

  1. Extracting date comparison logic into a helper function
  2. Using a more robust name comparison (e.g., Levenshtein distance)
  3. Adding comments explaining the matching criteria

Consider refactoring like this:

+function isSameMonthYear(date1: Date | null, date2: Date | null): boolean {
+  return (!date1 && !date2) || // Both dates are null
+    (date1 && date2 &&
+      date1.getMonth() === date2.getMonth() &&
+      date1.getFullYear() === date2.getFullYear())
+}
+
+function areOrganizationNamesSimilar(name1: string, name2: string): boolean {
+  // TODO: Implement Levenshtein distance comparison
+  return name1.toLowerCase().includes(name2.toLowerCase()) ||
+    name2.toLowerCase().includes(name1.toLowerCase())
+}

 // Check if any similar in existing work experiences
 const existingOrg = existingMemberData.organizations.find((o) => {
   const incomingOrgStartDate = org.startDate ? new Date(org.startDate) : null
   const incomingOrgEndDate = org.endDate ? new Date(org.endDate) : null
   const existingOrgStartDate = o.dateStart ? new Date(o.dateStart) : null
   const existingOrgEndEndDate = o.dateEnd ? new Date(o.dateEnd) : null

-  const isSameStartMonthYear =
-    (!incomingOrgStartDate && !existingOrgStartDate) || // Both start dates are null
-    (incomingOrgStartDate &&
-      existingOrgStartDate &&
-      incomingOrgStartDate.getMonth() === existingOrgStartDate.getMonth() &&
-      incomingOrgStartDate.getFullYear() === existingOrgStartDate.getFullYear())
+  const isSameStartMonthYear = isSameMonthYear(incomingOrgStartDate, existingOrgStartDate)
+  const isSameEndMonthYear = isSameMonthYear(incomingOrgEndDate, existingOrgEndEndDate)

   return (
-    (o.orgName.toLowerCase().includes(org.name.toLowerCase()) ||
-      org.name.toLowerCase().includes(o.orgName.toLowerCase())) &&
+    areOrganizationNamesSimilar(o.orgName, org.name) &&
     ((isSameStartMonthYear && isSameEndMonthYear) || org.title === o.jobTitle)
   )
 })

Committable suggestion skipped: line range outside the PR's diff.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (2)
services/apps/members_enrichment_worker/src/activities/enrichment.ts (2)

239-239: Consider using structured logging.

The logging changes from info to debug are consistent, but could benefit from structured logging patterns.

Consider enhancing the logging:

-svc.log.debug({ memberId }, 'Adding to member identities!')
+svc.log.debug({ 
+  memberId,
+  action: 'add_member_identities',
+  identityCount: squashedPayload.identities.length 
+}, 'Adding member identities')

Also applies to: 258-258, 290-290, 308-308


330-351: Enhance organization matching logic with early returns.

The nested organization matching logic could be more maintainable with early returns and clearer variable names.

Consider restructuring:

-      // try matching member's existing organizations with the new ones
-      // we'll be using displayName, title, dates
-      for (const org of squashedPayload.memberOrganizations) {
-        if (!org.organizationId) {
-          // Check if any similar in existing work experiences
-          const existingOrg = existingMemberData.organizations.find((o) =>
-            doesIncomingOrgExistInExistingOrgs(o, org),
-          )
-
-          if (existingOrg) {
-            // Get all orgs with the same name as the current one
-            const matchingOrgs = squashedPayload.memberOrganizations.filter(
-              (otherOrg) => otherOrg.name === org.name,
-            )
-
-            // Set organizationId for all matching orgs
-            for (const matchingOrg of matchingOrgs) {
-              matchingOrg.organizationId = existingOrg.orgId
-            }
-          }
-        }
-      }
+      // Match member's existing organizations with new ones using displayName, title, and dates
+      for (const incomingOrg of squashedPayload.memberOrganizations) {
+        // Skip if organization is already identified
+        if (incomingOrg.organizationId) continue;
+
+        // Find matching existing organization
+        const matchingExistingOrg = existingMemberData.organizations.find(existingOrg =>
+          doesIncomingOrgExistInExistingOrgs(existingOrg, incomingOrg)
+        );
+
+        if (!matchingExistingOrg) continue;
+
+        // Update all organizations with the same name
+        const relatedOrgs = squashedPayload.memberOrganizations.filter(
+          org => org.name === incomingOrg.name
+        );
+        
+        relatedOrgs.forEach(org => {
+          org.organizationId = matchingExistingOrg.orgId;
+        });
+      }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 3a52818 and 82aed4f.

📒 Files selected for processing (4)
  • services/apps/members_enrichment_worker/src/activities/enrichment.ts (20 hunks)
  • services/libs/common/src/array.ts (1 hunks)
  • services/libs/data-access-layer/src/old/apps/members_enrichment_worker/index.ts (3 hunks)
  • services/libs/types/src/enrichment.ts (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • services/libs/data-access-layer/src/old/apps/members_enrichment_worker/index.ts
🔇 Additional comments (6)
services/libs/types/src/enrichment.ts (2)

3-3: LGTM! Import statement is properly placed and used.

The import of IOrganizationIdentity is correctly placed with other related imports and is utilized within the file.


43-43: LGTM! Interface extension enhances organization identity tracking.

The optional identities property is well-designed:

  • Uses consistent naming with other interfaces
  • Properly typed as an array of organization identities
  • Maintains backward compatibility by being optional

Let's verify the usage of this new property:

✅ Verification successful

Based on the search results, I can now generate the final response:

The optional identities property is well-integrated and actively used across the codebase

The property is properly integrated into the organization-related functionality:

  • Used in member enrichment worker for organization data processing
  • Utilized in organization similarity calculations for merge suggestions
  • Implemented in organization repositories for identity management
  • Integrated with audit logging for tracking identity changes
  • Applied in organization sync services for data synchronization
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check how the new identities property is being used across the codebase

# Search for IMemberOrganizationData usage
rg -t typescript "IMemberOrganizationData" -A 5

# Search for specific usage of the identities property
rg -t typescript "organizations.*identities" -A 5

Length of output: 169


Script:

#!/bin/bash
# Let's try without typescript file type flag since it's not recognized
# Search for IMemberOrganizationData interface usage
rg "IMemberOrganizationData" -A 5

# Search for files containing organization identities
rg "organizations.*identities" -A 5

# Also search for direct usage of IOrganizationIdentity
rg "IOrganizationIdentity" -A 5

Length of output: 33257

services/apps/members_enrichment_worker/src/activities/enrichment.ts (4)

3-8: LGTM! Clean import organization.

The imports are well-organized and the new utilities from @crowd/common are properly imported.


934-940: ⚠️ Potential issue

Add error handling for sync API calls.

The syncMember function needs proper error handling and environment validation.


942-947: ⚠️ Potential issue

Add error handling for organization sync API calls.

The syncOrganization function needs proper error handling and environment validation.

Apply similar error handling as suggested for syncMember:

 export async function syncOrganization(organizationId: string): Promise<void> {
+  const apiUrl = process.env['CROWD_SEARCH_SYNC_API_URL']
+  if (!apiUrl) {
+    throw new Error('CROWD_SEARCH_SYNC_API_URL environment variable is not set')
+  }
+
   const syncApi = new SearchSyncApiClient({
-    baseUrl: process.env['CROWD_SEARCH_SYNC_API_URL'],
+    baseUrl: apiUrl,
   })
 
-  await syncApi.triggerOrganizationSync(organizationId, undefined, { withAggs: false })
+  try {
+    await syncApi.triggerOrganizationSync(organizationId, undefined, { withAggs: false })
+  } catch (error) {
+    svc.log.error({ error, organizationId }, 'Failed to sync organization')
+    throw error
+  }
 }

Likely invalid or redundant comment.


439-480: 🛠️ Refactor suggestion

Extract date comparison logic into helper functions.

The doesIncomingOrgExistInExistingOrgs function has complex date comparison logic that could be simplified.

Consider extracting helper functions:

+function getVerifiedPrimaryDomains(identities: any[]): string[] {
+  return identities
+    .filter(i => i.type === OrganizationIdentityType.PRIMARY_DOMAIN && i.verified)
+    .map(i => i.value);
+}
+
+function isSameMonthYear(date1: Date | null, date2: Date | null): boolean {
+  if (!date1 && !date2) return true;
+  if (!date1 || !date2) return false;
+  return date1.getMonth() === date2.getMonth() && 
+         date1.getFullYear() === date2.getFullYear();
+}
+
+function areOrganizationNamesSimilar(name1: string, name2: string): boolean {
+  const n1 = name1.toLowerCase();
+  const n2 = name2.toLowerCase();
+  return n1.includes(n2) || n2.includes(n1);
+}
+
 export function doesIncomingOrgExistInExistingOrgs(
   existingOrg: IMemberOrganizationData,
   incomingOrg: IMemberEnrichmentDataNormalizedOrganization,
 ): boolean {
-  const incomingVerifiedPrimaryDomainIdentityValues = incomingOrg.identities
-    .filter((i) => i.type === OrganizationIdentityType.PRIMARY_DOMAIN && i.verified)
-    .map((i) => i.value)
-
-  const existingVerifiedPrimaryDomainIdentityValues = existingOrg.identities
-    .filter((i) => i.type === OrganizationIdentityType.PRIMARY_DOMAIN && i.verified)
-    .map((i) => i.value)
+  const incomingDomains = getVerifiedPrimaryDomains(incomingOrg.identities);
+  const existingDomains = getVerifiedPrimaryDomains(existingOrg.identities);
 
   const incomingOrgStartDate = incomingOrg.startDate ? new Date(incomingOrg.startDate) : null
   const incomingOrgEndDate = incomingOrg.endDate ? new Date(incomingOrg.endDate) : null
   const existingOrgStartDate = existingOrg.dateStart ? new Date(existingOrg.dateStart) : null
   const existingOrgEndEndDate = existingOrg.dateEnd ? new Date(existingOrg.dateEnd) : null
 
-  const isSameStartMonthYear =
-    (!incomingOrgStartDate && !existingOrgStartDate) ||
-    (incomingOrgStartDate &&
-      existingOrgStartDate &&
-      incomingOrgStartDate.getMonth() === existingOrgStartDate.getMonth() &&
-      incomingOrgStartDate.getFullYear() === existingOrgStartDate.getFullYear())
-
-  const isSameEndMonthYear =
-    (!incomingOrgEndDate && !existingOrgEndEndDate) ||
-    (incomingOrgEndDate &&
-      existingOrgEndEndDate &&
-      incomingOrgEndDate.getMonth() === existingOrgEndEndDate.getMonth() &&
-      incomingOrgEndDate.getFullYear() === existingOrgEndEndDate.getFullYear())
+  const hasSameStartDate = isSameMonthYear(incomingOrgStartDate, existingOrgStartDate);
+  const hasSameEndDate = isSameMonthYear(incomingOrgEndDate, existingOrgEndEndDate);
 
   return (
-    hasIntersection(
-      incomingVerifiedPrimaryDomainIdentityValues,
-      existingVerifiedPrimaryDomainIdentityValues,
-    ) ||
-    ((existingOrg.orgName.toLowerCase().includes(incomingOrg.name.toLowerCase()) ||
-      incomingOrg.name.toLowerCase().includes(existingOrg.orgName.toLowerCase())) &&
-      ((isSameStartMonthYear && isSameEndMonthYear) || incomingOrg.title === existingOrg.jobTitle))
+    hasIntersection(incomingDomains, existingDomains) ||
+    (areOrganizationNamesSimilar(existingOrg.orgName, incomingOrg.name) &&
+     ((hasSameStartDate && hasSameEndDate) || incomingOrg.title === existingOrg.jobTitle))
   )
 }

Likely invalid or redundant comment.

Comment on lines +133 to +136
export const hasIntersection = (arr1: string[], arr2: string[]): boolean => {
const set1 = new Set(arr1)
return arr2.some((item) => set1.has(item))
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add input validation and edge case handling.

The hasIntersection function implementation is efficient using Set, but could be more robust with additional checks.

Consider this enhanced implementation:

-export const hasIntersection = (arr1: string[], arr2: string[]): boolean => {
-  const set1 = new Set(arr1)
-  return arr2.some((item) => set1.has(item))
-}
+export const hasIntersection = (arr1: string[], arr2: string[]): boolean => {
+  // Validate inputs
+  if (!Array.isArray(arr1) || !Array.isArray(arr2)) {
+    throw new TypeError('Both arguments must be arrays')
+  }
+
+  // Handle empty arrays
+  if (arr1.length === 0 || arr2.length === 0) {
+    return false
+  }
+
+  // Create Set from the smaller array for better performance
+  const [smallerArr, largerArr] = arr1.length < arr2.length ? [arr1, arr2] : [arr2, arr1]
+  const set = new Set(smallerArr.map(item => item?.toLowerCase()))
+
+  // Check for intersection with case-insensitive comparison
+  return largerArr.some(item => item && set.has(item.toLowerCase()))
+}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export const hasIntersection = (arr1: string[], arr2: string[]): boolean => {
const set1 = new Set(arr1)
return arr2.some((item) => set1.has(item))
}
export const hasIntersection = (arr1: string[], arr2: string[]): boolean => {
// Validate inputs
if (!Array.isArray(arr1) || !Array.isArray(arr2)) {
throw new TypeError('Both arguments must be arrays')
}
// Handle empty arrays
if (arr1.length === 0 || arr2.length === 0) {
return false
}
// Create Set from the smaller array for better performance
const [smallerArr, largerArr] = arr1.length < arr2.length ? [arr1, arr2] : [arr2, arr1]
const set = new Set(smallerArr.map(item => item?.toLowerCase()))
// Check for intersection with case-insensitive comparison
return largerArr.some(item => item && set.has(item.toLowerCase()))
}

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)
services/apps/members_enrichment_worker/src/activities/enrichment.ts (2)

445-485: Refactor date comparison logic for better maintainability

The date comparison logic is duplicated and could be extracted into a helper function for better maintainability.

Consider this refactoring:

+function isSameMonthYear(date1: Date | null, date2: Date | null): boolean {
+  return (!date1 && !date2) || // Both dates are null
+    (date1 && date2 &&
+      date1.getMonth() === date2.getMonth() &&
+      date1.getFullYear() === date2.getFullYear())
+}

 export function doesIncomingOrgExistInExistingOrgs(
   existingOrg: IMemberOrganizationData,
   incomingOrg: IMemberEnrichmentDataNormalizedOrganization,
 ): boolean {
   // ... existing identity filtering code ...

   const incomingOrgStartDate = incomingOrg.startDate ? new Date(incomingOrg.startDate) : null
   const incomingOrgEndDate = incomingOrg.endDate ? new Date(incomingOrg.endDate) : null
   const existingOrgStartDate = existingOrg.dateStart ? new Date(existingOrg.dateStart) : null
   const existingOrgEndEndDate = existingOrg.dateEnd ? new Date(existingOrg.dateEnd) : null

-  const isSameStartMonthYear =
-    (!incomingOrgStartDate && !existingOrgStartDate) || // Both start dates are null
-    (incomingOrgStartDate &&
-      existingOrgStartDate &&
-      incomingOrgStartDate.getMonth() === existingOrgStartDate.getMonth() &&
-      incomingOrgStartDate.getFullYear() === existingOrgStartDate.getFullYear())
+  const isSameStartMonthYear = isSameMonthYear(incomingOrgStartDate, existingOrgStartDate)
+  const isSameEndMonthYear = isSameMonthYear(incomingOrgEndDate, existingOrgEndEndDate)

-  const isSameEndMonthYear =
-    (!incomingOrgEndDate && !existingOrgEndEndDate) || // Both end dates are null
-    (incomingOrgEndDate &&
-      existingOrgEndEndDate &&
-      incomingOrgEndDate.getMonth() === existingOrgEndEndDate.getMonth() &&
-      incomingOrgEndDate.getFullYear() === existingOrgEndEndDate.getFullYear())

   return (
     hasIntersection(
       incomingVerifiedPrimaryDomainIdentityValues,
       existingVerifiedPrimaryDomainIdentityValues,
     ) ||
     ((existingOrg.orgName.toLowerCase().includes(incomingOrg.name.toLowerCase()) ||
       incomingOrg.name.toLowerCase().includes(existingOrg.orgName.toLowerCase())) &&
       ((isSameStartMonthYear && isSameEndMonthYear) || incomingOrg.title === existingOrg.jobTitle))
   )
 }

Line range hint 847-925: Consider splitting the work experience preparation logic

The prepareWorkExperiences function is becoming complex with multiple responsibilities. Consider splitting it into smaller, focused functions.

Suggested structure:

interface IWorkExperiencePreparation {
  processHighConfidenceSource(): IWorkExperienceChanges;
  processManualChanges(): IWorkExperienceChanges;
  processNewExperiences(): IWorkExperienceChanges;
}

class WorkExperiencePreparation implements IWorkExperiencePreparation {
  constructor(
    private oldVersion: IMemberOrganizationData[],
    private newVersion: IMemberEnrichmentDataNormalizedOrganization[],
    private isHighConfidenceSource: boolean
  ) {}

  prepare(): IWorkExperienceChanges {
    if (this.isHighConfidenceSource) {
      return this.processHighConfidenceSource();
    }
    return this.processRegularSource();
  }

  private processRegularSource(): IWorkExperienceChanges {
    const manualChanges = this.processManualChanges();
    const newExperiences = this.processNewExperiences();
    return this.mergeChanges(manualChanges, newExperiences);
  }
  
  // ... implement other methods
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between b8d5d2b and 7318e16.

📒 Files selected for processing (1)
  • services/apps/members_enrichment_worker/src/activities/enrichment.ts (19 hunks)
🔇 Additional comments (5)
services/apps/members_enrichment_worker/src/activities/enrichment.ts (5)

3-8: LGTM!

The new imports are properly aligned with the added functionality and follow the module's organization pattern.


239-239: LGTM: Logging level adjustments

The change from info to debug level is appropriate for these operational logs.

Also applies to: 258-258, 290-290, 308-308


940-953: ⚠️ Potential issue

Add error handling and environment validation for sync functions

The sync functions need proper error handling and environment validation.

Apply this diff:

+function validateSyncApiUrl(): string {
+  const apiUrl = process.env['CROWD_SEARCH_SYNC_API_URL'];
+  if (!apiUrl) {
+    throw new Error('CROWD_SEARCH_SYNC_API_URL environment variable is not set');
+  }
+  return apiUrl;
+}

 export async function syncMember(memberId: string): Promise<void> {
+  const apiUrl = validateSyncApiUrl();
   const syncApi = new SearchSyncApiClient({
-    baseUrl: process.env['CROWD_SEARCH_SYNC_API_URL'],
+    baseUrl: apiUrl,
   })

-  await syncApi.triggerMemberSync(memberId, { withAggs: false })
+  try {
+    await syncApi.triggerMemberSync(memberId, { withAggs: false })
+  } catch (error) {
+    svc.log.error({ error, memberId }, 'Failed to sync member')
+    throw error
+  }
 }

 export async function syncOrganization(organizationId: string): Promise<void> {
+  const apiUrl = validateSyncApiUrl();
   const syncApi = new SearchSyncApiClient({
-    baseUrl: process.env['CROWD_SEARCH_SYNC_API_URL'],
+    baseUrl: apiUrl,
   })

-  await syncApi.triggerOrganizationSync(organizationId, undefined, { withAggs: false })
+  try {
+    await syncApi.triggerOrganizationSync(organizationId, undefined, { withAggs: false })
+  } catch (error) {
+    svc.log.error({ error, organizationId }, 'Failed to sync organization')
+    throw error
+  }
 }

426-438: ⚠️ Potential issue

Add error handling for sync operations

The sync operations after member updates lack error handling. A failure in sync operations could leave the system in an inconsistent state.

Apply this diff:

-    await Promise.all(promises)
+    try {
+      await Promise.all(promises)
+    } catch (error) {
+      svc.log.error({ error, memberId }, 'Failed to process member updates')
+      throw error
+    }

     if (updated) {
       await setMemberEnrichmentUpdateDateDb(tx.transaction(), memberId)
-      await syncMember(memberId)
+      try {
+        await syncMember(memberId)
+      } catch (error) {
+        svc.log.error({ error, memberId }, 'Failed to sync member')
+        throw error
+      }
     } else {
       await setMemberEnrichmentTryDateDb(tx.transaction(), memberId)
     }

-    for (const orgId of orgIdsToSync) {
-      await syncOrganization(orgId)
+    try {
+      await Promise.all(orgIdsToSync.map(orgId => syncOrganization(orgId)))
+    } catch (error) {
+      svc.log.error({ error, orgIds: orgIdsToSync }, 'Failed to sync organizations')
+      throw error
+    }

Likely invalid or redundant comment.


230-231: Verify high confidence source handling

The new parameter isHighConfidenceSourceSelectedForWorkExperiences has significant impact on work experience data. Let's verify its usage across the codebase.

✅ Verification successful

Parameter usage is properly implemented and contained

The isHighConfidenceSourceSelectedForWorkExperiences parameter is used correctly within a well-defined scope:

  • It's passed from updateMemberUsingSquashedPayload to prepareWorkExperiences
  • In prepareWorkExperiences, it controls whether to completely replace work experiences (when true) or preserve manually created ones (when false)
  • The implementation is self-contained and doesn't require changes in other parts of the codebase
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check how isHighConfidenceSourceSelectedForWorkExperiences is used
# Expected: Find all calls to updateMemberUsingSquashedPayload to verify proper parameter passing

rg -A 5 "updateMemberUsingSquashedPayload.*isHighConfidenceSourceSelectedForWorkExperiences"

Length of output: 94


Script:

#!/bin/bash
# Let's first find the file containing the function definition
fd "enrichment.ts" --exec ast-grep --pattern 'function updateMemberUsingSquashedPayload($_) {
  $$$
}'

# Also search for any references to this parameter
rg "isHighConfidenceSourceSelectedForWorkExperiences"

Length of output: 670


Script:

#!/bin/bash
# Let's see the context around these usages to understand the implementation
rg -B 5 -A 10 "isHighConfidenceSourceSelectedForWorkExperiences" services/apps/members_enrichment_worker/src/activities/enrichment.ts

Length of output: 2094

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
services/libs/opensearch/src/service/organization.sync.service.ts (1)

417-422: Add error logging for skipped organizations

The null check is a good addition to prevent processing undefined organizations. However, consider adding error logging when an organization is skipped to help with debugging.

 if (base) {
   const data = await buildFullOrgForMergeSuggestions(qx, base)
   const prefixed = OrganizationSyncService.prefixData(data)
   await this.openSearchService.index(orgId, OpenSearchIndex.ORGANIZATIONS, prefixed)
+} else {
+  this.log.warn({ orgId }, 'Organization not found, skipping OpenSearch indexing')
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 7318e16 and 47f9263.

📒 Files selected for processing (1)
  • services/libs/opensearch/src/service/organization.sync.service.ts (1 hunks)
🔇 Additional comments (1)
services/libs/opensearch/src/service/organization.sync.service.ts (1)

417-422: Verify member enrichment impact

Since this change is part of member enrichment tweaks, let's verify that skipping non-existent organizations doesn't affect member processing.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (2)
services/apps/members_enrichment_worker/src/activities/enrichment.ts (2)

853-854: Document the high confidence source parameter.

The isHighConfidenceSourceSelectedForWorkExperiences parameter needs documentation explaining its purpose and impact on work experience processing.

Add JSDoc comment:

+/**
+ * Prepares work experiences for update by comparing old and new versions.
+ * @param oldVersion - Existing work experiences in the database
+ * @param newVersion - New work experiences from enrichment
+ * @param isHighConfidenceSourceSelectedForWorkExperiences - When true, replaces all existing work experiences with new ones
+ * @returns Object containing work experiences to delete, create, and update
+ */
 function prepareWorkExperiences(

919-920: Improve type safety in work experience updates.

Using Record<string, any> reduces type safety. Consider creating a specific interface for work experience updates.

+interface IWorkExperienceUpdate {
+  dateEnd?: string | null;
+  // add other possible update fields
+}

-const toUpdateInner: Record<string, any> = {}
+const toUpdateInner: IWorkExperienceUpdate = {}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 47f9263 and 1606a84.

📒 Files selected for processing (1)
  • services/apps/members_enrichment_worker/src/activities/enrichment.ts (19 hunks)
🔇 Additional comments (3)
services/apps/members_enrichment_worker/src/activities/enrichment.ts (3)

3-8: LGTM: Import changes are well-structured.

The new imports from @crowd/common are properly added and utilized throughout the code.


946-959: ⚠️ Potential issue

Add error handling and environment validation for sync functions.

The sync functions need proper error handling and environment variable validation.

This issue was previously identified in a past review. The same fixes for error handling and environment validation should be applied here.


239-239: Verify logging visibility after level changes.

The change from info to debug level is appropriate for these operational logs. However, ensure that these events are still adequately visible for troubleshooting when needed.

Also applies to: 258-258, 290-290, 308-308

✅ Verification successful

Based on the log level usage patterns shown in the codebase, I can now generate the final response:

Log level changes align with codebase patterns

The change from info to debug for these operational logs is consistent with the codebase's logging patterns where:

  • debug is used for detailed operational tracing (e.g., query execution, message processing)
  • info is used for significant state changes and important operations
  • warn is used for concerning but non-critical issues
  • error is used for failures and critical issues

The changed logs are appropriately set to debug as they trace routine member identity processing operations, similar to other debug-level operational logs in the codebase.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for other log levels in the codebase to ensure consistent logging patterns
rg -A 1 "log\.(info|debug|warn|error)" --type ts

Length of output: 70664

Comment on lines +379 to +385
orgIdsToSync.map((orgId) =>
syncOrganization(orgId).catch((error) => {
console.error(`Failed to sync organization with ID ${orgId}:`, error)
}),
),
),
),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Improve error handling in organization sync operations.

The current error handling has several issues:

  1. Uses console.error instead of the service logger (svc.log.error)
  2. Silently catches errors without proper error propagation

Apply this fix:

 orgIdsToSync.map((orgId) =>
-  syncOrganization(orgId).catch((error) => {
-    console.error(`Failed to sync organization with ID ${orgId}:`, error)
-  }),
+  syncOrganization(orgId).catch((error) => {
+    svc.log.error({ error, orgId }, 'Failed to sync organization')
+    throw error
+  }),
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
orgIdsToSync.map((orgId) =>
syncOrganization(orgId).catch((error) => {
console.error(`Failed to sync organization with ID ${orgId}:`, error)
}),
),
),
),
orgIdsToSync.map((orgId) =>
syncOrganization(orgId).catch((error) => {
svc.log.error({ error, orgId }, 'Failed to sync organization')
throw error
}),
),
),
),

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
services/apps/members_enrichment_worker/src/activities/enrichment.ts (1)

453-494: Refactor organization matching logic for better maintainability.

The organization matching logic is complex and could benefit from:

  1. Extracting date comparison logic into helper functions
  2. Adding comments explaining the matching criteria

Consider this refactor:

+function isSameMonthYear(date1: Date | null, date2: Date | null): boolean {
+  return (!date1 && !date2) || // Both dates are null
+    (date1 && date2 &&
+      date1.getMonth() === date2.getMonth() &&
+      date1.getFullYear() === date2.getFullYear())
+}

 export function doesIncomingOrgExistInExistingOrgs(
   existingOrg: IMemberOrganizationData,
   incomingOrg: IMemberEnrichmentDataNormalizedOrganization,
 ): boolean {
   // Extract domain identities for comparison
   const incomingVerifiedPrimaryDomainIdentityValues = incomingOrg.identities
     .filter((i) => i.type === OrganizationIdentityType.PRIMARY_DOMAIN && i.verified)
     .map((i) => i.value)

   const existingVerifiedPrimaryDomainIdentityValues = existingOrg.identities
     .filter((i) => i.type === OrganizationIdentityType.PRIMARY_DOMAIN && i.verified)
     .map((i) => i.value)

   const incomingOrgStartDate = incomingOrg.startDate ? new Date(incomingOrg.startDate) : null
   const incomingOrgEndDate = incomingOrg.endDate ? new Date(incomingOrg.endDate) : null
   const existingOrgStartDate = existingOrg.dateStart ? new Date(existingOrg.dateStart) : null
   const existingOrgEndEndDate = existingOrg.dateEnd ? new Date(existingOrg.dateEnd) : null

-  const isSameStartMonthYear =
-    (!incomingOrgStartDate && !existingOrgStartDate) || // Both start dates are null
-    (incomingOrgStartDate &&
-      existingOrgStartDate &&
-      incomingOrgStartDate.getMonth() === existingOrgStartDate.getMonth() &&
-      incomingOrgStartDate.getFullYear() === existingOrgStartDate.getFullYear())
+  const isSameStartMonthYear = isSameMonthYear(incomingOrgStartDate, existingOrgStartDate)
+  const isSameEndMonthYear = isSameMonthYear(incomingOrgEndDate, existingOrgEndEndDate)

   return (
     hasIntersection(
       incomingVerifiedPrimaryDomainIdentityValues,
       existingVerifiedPrimaryDomainIdentityValues,
     ) ||
     ((existingOrg.orgName.toLowerCase().includes(incomingOrg.name.toLowerCase()) ||
       incomingOrg.name.toLowerCase().includes(existingOrg.orgName.toLowerCase())) &&
       ((isSameStartMonthYear && isSameEndMonthYear) || incomingOrg.title === existingOrg.jobTitle))
   )
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between c87f732 and 6ae3a91.

📒 Files selected for processing (1)
  • services/apps/members_enrichment_worker/src/activities/enrichment.ts (18 hunks)
🔇 Additional comments (3)
services/apps/members_enrichment_worker/src/activities/enrichment.ts (3)

Line range hint 3-46: LGTM: Import statements are properly organized.

The new imports are correctly added and are used throughout the code.


381-387: ⚠️ Potential issue

Improve error handling in organization sync operations.

The current error handling has several issues:

  1. Uses console.error instead of the service logger
  2. Silently catches errors without proper error propagation

Apply this fix:

 orgIdsToSync.map((orgId) =>
-  syncOrganization(orgId).catch((error) => {
-    console.error(`Failed to sync organization with ID ${orgId}:`, error)
-  }),
+  syncOrganization(orgId).catch((error) => {
+    svc.log.error({ error, orgId }, 'Failed to sync organization')
+    throw error
+  }),

Likely invalid or redundant comment.


953-966: ⚠️ Potential issue

Add error handling and environment validation for sync functions.

The sync functions need:

  1. Error handling for API calls
  2. Validation of environment variables

Add error handling:

 export async function syncMember(memberId: string): Promise<void> {
+  const apiUrl = process.env['CROWD_SEARCH_SYNC_API_URL']
+  if (!apiUrl) {
+    throw new Error('CROWD_SEARCH_SYNC_API_URL environment variable is not set')
+  }
+
   const syncApi = new SearchSyncApiClient({
-    baseUrl: process.env['CROWD_SEARCH_SYNC_API_URL'],
+    baseUrl: apiUrl,
   })

-  await syncApi.triggerMemberSync(memberId, { withAggs: false })
+  try {
+    await syncApi.triggerMemberSync(memberId, { withAggs: false })
+  } catch (error) {
+    svc.log.error({ error, memberId }, 'Failed to sync member')
+    throw error
+  }
 }

 export async function syncOrganization(organizationId: string): Promise<void> {
+  const apiUrl = process.env['CROWD_SEARCH_SYNC_API_URL']
+  if (!apiUrl) {
+    throw new Error('CROWD_SEARCH_SYNC_API_URL environment variable is not set')
+  }
+
   const syncApi = new SearchSyncApiClient({
-    baseUrl: process.env['CROWD_SEARCH_SYNC_API_URL'],
+    baseUrl: apiUrl,
   })

-  await syncApi.triggerOrganizationSync(organizationId, undefined, { withAggs: false })
+  try {
+    await syncApi.triggerOrganizationSync(organizationId, undefined, { withAggs: false })
+  } catch (error) {
+    svc.log.error({ error, organizationId }, 'Failed to sync organization')
+    throw error
+  }
 }

Likely invalid or redundant comment.

@epipav epipav merged commit 157de36 into main Dec 10, 2024
8 checks passed
@epipav epipav deleted the bugfix/check-org-existence-in-enrichment-fix branch December 10, 2024 08:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant