Skip to content

Schematic update recorder indices need to be different for BOM files #14558

@devversion

Description

@devversion

🐞 Bug report

Package

@angular-devkit/schematics

Is this a regression?

No

Description

Currently schematics which leverage the public UpdateRecorder API to make source-file transformations will get different results for source-files which contain a BOM.

This is because the CLI magically tries to account for the BOM in source-files by always adding 1 to passed indices. This is unexpected and causes confusing behavior as this implies that the UpdateRecorder only expects character indices for visible characters.

Schematics that currently determine the character indices by parsing the source-files (e.g. through TypeScript compiler API or fs.readFileSync) will get broken behavior for source-files that contain a BOM as the replacements are accidentally shifted by one.

We need to do one of the following:

  1. Either provide a clear API description explaining the story with BOM and what indices the
    update recorder expects. This forces schematic authors to know about BOM and to strip the BOM when reading source-files.

  2. A long-term solution where the UpdateRecorder expects character indices which are truly based on the source-file. No magic differentiation between BOM / non-BOM files.

@alan-agius4 wrote

The main reason for this is that when a developer is writing a schematic they shouldn't need to compute the offset based if a file has a BOM or not and is handled out of the box.

I want to add to that: I see the point of doing that, but it is simply resulting in magic behavior as schematic indices are not determined by just looking at the visible rendered characters but rather by parsed source-files where the BOM is a valid invisible character that still adds to the amount characters.

e.g. fs.readFileSync('./with_bom.txt', 'utf8').length will respect the BOM characters and therefore it's common that indices passed to the update recorder will be truly absolute.

Ideally the magic differentation between BOM and non-BOM files would have not happened at all, but for now the easiest way to fix this in a backwards-compatible way (where older CLI versions are used) is to strip the BOM when parsing the TypeScript source-files.

We need to find a proper solution for this in the future though. It adds unnecessary bloat and confusion to schematic authors as they need to know about different behavior with BOM / and need to handle BOM characters in a special way for now. See the unnecessary bloat we need to add:

A clear and concise description of the problem...

🔬 Minimal Reproduction

See test-case in angular/angular#30719 or various other affected issues:

🔥 Exception or Error

Replacements are shifted by one character if the source-file contains a BOM. Can result in
broken static-query migration where ViewChild will be converted to VViewChild or in a broken lazy route syntax migration where routes are incorrectly converted (#14551)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions