Skip to content

language in HTML-Code for ::addHTML #1415

@tom-lp-at

Description

@tom-lp-at

This is:

Expected Behavior

To find the correct language for the <span lang="DE">

Please describe the behavior you are expecting.

process the addHTML - indipentend which lang-code is in the

Current Behavior

no DOCX is generated

What is the current behavior?

phpWord crashes if a language code is not defined

Failure Information

PHP Fatal error: Uncaught InvalidArgumentException: DE is not a valid language code in /vendor/phpoffice/phpword/src/PhpWord/Style/Language.php:226

Please help provide information about the failure.
The source is a copy+past from a Outlook-eMail into a CKEditor. Stored in MySQL and retrieved into a ::addHTML()
<p class="MsoNormal"><strong><span lang="DE" style="color:windowtext; mso-fareast-language:DE-AT">

How to Reproduce

Use the code above

Please provide a code sample that reproduces the issue.

<?php
require __DIR__ . '/vendor/autoload.php';

$phpWord = new \PhpOffice\PhpWord\PhpWord();
$section = $phpWord->addSection();
foreach($arr_follow as $key_ticket => $followups) {
	if (strlen($followups['contents']) > 0) {
		$content = Tom_excerp_voll($followups['contents']);
		$table->addRow();
		$table->addCell()->addText($followups['date'],$TableCellStyle);
		$table->addCell()->addText($followups['dauer'],$TableCellStyle);
		$table->addCell()->addText($followups['author'],$TableCellStyle);
		$zelle = $table->addCell(6600,$TableCellStyle);
		\PhpOffice\PhpWord\Shared\Html::addHtml($zelle, $content,false,false,null,["font" => array("size" => 6)]);
	}
}

function Tom_excerp_voll($text) {
        // This are all Tags that should be removed to work proper :(
 	$pattern = array('/<div.*?>/','/<\/div>/','/<p.*?>/','/<\/p>/','/<a.*?>/','/<\/a>/','/<img.*?>/','/<strong>/','/<\/strong>/');
//  	$replace = array('','','','<br/>','','',''); // for testing
//  	$replace = array('','','','','','','','<div><b>','</b></div>'); // for testing - no success :(
 	$replace = array('','','','','','','','','');
 	$text = preg_replace($pattern,$replace, $text);

        // To avoid a double line-break on <br />
	$text = str_replace("<br />","</span><span>",'<span>'.$text.'</span>');

        // This workaround changes the lang to a "valid" langcode !!!!!
 	$text = str_replace("lang=\"DE\"","lange='de-DE'",$text);

        // To remove Mailaddresses like "Max Mustermann <[email protected]>"
        // this kind of address is recogniced as a Tag and i have to remove it for DOM->loadXML()
 	$text = preg_replace("/<(\/)?([a-zA-Z0-9._-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4})>/","",$text);

        // Remove all Spaces - sometimes there are a massive amount of Spaces in inside-Tables
 	$text = preg_replace('/(\>)\s*(\<)/m', '$1$2', $text);

        // Remove all Tab´s - sometimes there are a massive amount of Tab´s in inside-Tables
	$text = trim(preg_replace('/\t+/', '', $text));
	return $text;
}

Context

The function "Tom_excerp_voll" is only for documentation which workarounds are currently implemented to get a "valid" Document.... (with the restriction that i miss the most of the styling (bold; italic;...) )

I think the main problem is the diversity of HTML-sources. In my case it´s an eMail that is copy+paste from Outlook. The "real" language-code is provided by "mso-fareast-language" in the style-element in this case (Outlook-Client) and not by lang= - it holds only a short language code.

But phpWord did only check for a view language codes ('de-AT' is also not there) in lang=.
I am sure there are more language codes as the 13 they are currently in phpword/src/PhpWord/Style/Language.php :)

Now the Feature-Request :)

In my opinion it would be the best to have a fallback language-code that is declared in the settings (and customizable by the programmer) to avoid the crashes .... (or something else...)

  • PHP version: 7.2.6
  • PHPWord version: dev-master (+some hints from other tickets)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions