optimize ctpop codegen by width and narrow ctpop intrinsic in IR

|  |  |
| --- | --- |
| Bugzilla Link | [43688](https://llvm.org/bz43688) |
| Version | trunk |
| OS | All |
| CC | @davidbolvansky,@RKSimon |

## Extended Description 
Spinning this off from bug 43656:
```llvm
define i32 @zpop(i8 %x) {
  %z = zext i8 %x to i32
  %pop = tail call i32 @llvm.ctpop.i32(i32 %z)
  ret i32 %pop
}

define i32 @popz(i8 %x) {
  %pop = tail call i8 @llvm.ctpop.i8(i8 %x)
  %z = zext i8 %pop to i32
  ret i32 %z
}

declare i8 @llvm.ctpop.i8(i8)
declare i32 @llvm.ctpop.i32(i32)
```
--------------------------------------------------------------------------

These are equivalent, so we should try to canonicalize them in IR. The narrow call is likely better for vectorization and would line up with our transforms of most math/logic ops. 

But we don't have DAGCombiner and/or legalization to ensure that the narrow call is optimized in codegen.

For example on base x86-64:
```asm
_zpop:                                  ## @zpop
	movzbl	%dil, %eax
	movl	%eax, %ecx
	shrl	%ecx
	andl	$-43, %ecx
	subl	%ecx, %eax
	movl	%eax, %ecx
	andl	$858993459, %ecx        ## imm = 0x33333333
	shrl	$2, %eax
	andl	$858993459, %eax        ## imm = 0x33333333
	addl	%ecx, %eax
	movl	%eax, %ecx
	shrl	$4, %ecx
	addl	%eax, %ecx
	andl	$252645135, %ecx        ## imm = 0xF0F0F0F
	imull	$16843009, %ecx, %eax   ## imm = 0x1010101
	shrl	$24, %eax
	retq
	
_popz:                                  ## @popz
	movl	%edi, %eax
	shrb	%al
	andb	$85, %al
	subb	%al, %dil
	movl	%edi, %eax
	andb	$51, %al
	shrb	$2, %dil
	andb	$51, %dil
	addb	%al, %dil
	movl	%edi, %eax
	shrb	$4, %al
	addb	%dil, %al
	andb	$15, %al
	movzbl	%al, %eax
	retq
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

optimize ctpop codegen by width and narrow ctpop intrinsic in IR #43033

Extended Description

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development


Bugzilla Link	43688
Version	trunk
OS	All
CC	@davidbolvansky,@RKSimon

optimize ctpop codegen by width and narrow ctpop intrinsic in IR #43033

Description

Extended Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions