Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Conversation

@ghost
Copy link

@ghost ghost commented Jun 22, 2018

No description provided.

// All ICU headers need to be included here so that all function prototypes are
// available before the function pointers are declared below.
#include <unicode/locid.h>
//#include <unicode/locid.h>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is not needed - delete it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do, this is a C++ header, based on http://icu-project.org/apiref/icu4c/files.html

@jkotas
Copy link
Member

jkotas commented Jun 23, 2018

It would be nice to mark all the exports with DLLEXPORT like it is done in corefx, and hide everything else.

// Couldn't find the data we need for this locale, we should fallback.
if (localeNameBuf[0] == 0x0)
{
ures_close(rootResBundle);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to do this sort of error handling using goto done pattern. It avoids duplicating the cleanup and mistakes that the duplication tend to lead to.

@ghost
Copy link
Author

ghost commented Jun 23, 2018

I have set fvisibility hidden for this lib, ported pal_compiler.h from corefx and added pal_<name>.h corefx-style headers with DLLEXPORT markings.

@jkotas
Copy link
Member

jkotas commented Jun 23, 2018

@dotnet-bot test this please

// doing the KanaType custom rule.
customRules.reserve(512);
customRules = (UCharList*)malloc(sizeof(UCharList*));
customRules->items = calloc(512, sizeof(UChar*));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 512 always going to be enough?

Copy link
Author

@ghost ghost Jun 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added an AddItem function that checks and reallocate if it goes beyond the capacity.

@jkotas
Copy link
Member

jkotas commented Jun 23, 2018

The final binary seems to still depend on libstdc++.so.6. Is there something that needs to be done to actually trim the dependency?

@ghost
Copy link
Author

ghost commented Jun 23, 2018

Setting the language to C at top of CMakeLists wasn't enough, b9e9d3d. Had to do https://stackoverflow.com/a/29204610 and some more.
In CoreFX, there is no CXX compiler detection by cmake anymore, so this is not needed. I will double check.

return version;
}

__attribute__((destructor))
Copy link
Member

@jkotas jkotas Jun 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@janvorli I think that this shutdown method is broken. There can be libicu code running during shutdown. Unloading icu here will cause intermittent crashes during process exit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tried to make it crash and there does not appear to be a real problem. dlclose during shutdown seems to be ignored - the library is not actually unmapped.

int32_t length = uloc_getLanguage(locale, NULL, 0, &status) + 1;

std::vector<char> buf(length + 1, '\0');
char* buf = calloc(length, sizeof(char*));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be freed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check for null would be nice too.

int32_t icuPatternLength = unum_toPattern(pNumberFormat, false, NULL, 0, &ignore) + 1;

std::vector<UChar> icuPattern(icuPatternLength + 1, '\0');
UChar* icuPattern = calloc(icuPatternLength, sizeof(UChar*));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

matching free?

@jkotas
Copy link
Member

jkotas commented Jun 24, 2018

Tests are failing

@ghost
Copy link
Author

ghost commented Jun 24, 2018

@jkotas, windows legs failing is a known/unrelated issue. Is there a way to tell what is causing those test crash from the build output in unix legs?

@jkotas
Copy link
Member

jkotas commented Jun 24, 2018

Is there a way to tell what is causing those test crash from the build output in unix legs?

It is crashing very early during startup on memory corruption. It is likely bug in your changes. It you run anything locally under debugger, you should be able to find the problem pretty quickly.

@ghost
Copy link
Author

ghost commented Jun 24, 2018

I haven't built the interloop testes locally, it takes enormous amount of time and resources. Can i download artifacts from somewhere?

@jkotas
Copy link
Member

jkotas commented Jun 24, 2018

I expect that the crash will repro even if you run a simple hello world. https://github.com/dotnet/coreclr#using-your-build has two ways you can run a simple hello world (I typically use the corerun path in situations like this.)

@ghost
Copy link
Author

ghost commented Jun 24, 2018

I expect that the crash will repro even if you run a simple hello world.

That would mean the non-interloop test legs that are passing are not running any tests?

@ghost
Copy link
Author

ghost commented Jun 24, 2018

my disk space is running full and couldn't build the whole thing. I was building only parts (GlobalizationNative) and travis-ci for mac.
@jkotas, if you could help spotting the issue with changes, I'll fix them, otherwise it will take sometime before i could get my hands on spare system to run debugger.

Copy link
Member

@janvorli janvorli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will continue reviewing later today, but it seems I may have found the crash reason.

int32_t patternLen = udatpg_getBestPattern(pGenerator, patternSkeleton, -1, NULL, 0, &ignore) + 1;

std::vector<UChar> bestPattern(patternLen + 1, '\0');
UChar* bestPattern = calloc(patternLen, sizeof(UChar*));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you allocating one less char than the original version?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added + 1 in the line above.


std::vector<UChar> bestPattern(patternLen + 1, '\0');
UChar* bestPattern = calloc(patternLen, sizeof(UChar*));
if(bestPattern == NULL)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nit - missing space between the if and the opening parenthese.


std::vector<UChar> symbolBuf(symbolLen + 1, '\0');
UChar* symbolBuf = calloc(symbolLen, sizeof(UChar*));
if(symbolBuf == NULL)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nit - missing space between the if and the opening parenthese.

int symbolLen = udat_getSymbols(pFormat, type, i, NULL, 0, &ignore) + 1;

std::vector<UChar> symbolBuf(symbolLen + 1, '\0');
UChar* symbolBuf = calloc(symbolLen, sizeof(UChar*));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same question as in a similar case before - why are you allocating one less UChar than the original code?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added 1 to symbolLen above

if (U_FAILURE(err))
{
udat_close(pFormat);
free(pFormat);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems it should be free(symbolBuf) instead.

@jkotas
Copy link
Member

jkotas commented Jun 24, 2018

Try doing this (using the .NET Core 2.1 SDK):

dotnet new console
dotnet publish -r linux-x64
... replace System.Globalization.Native.so with the one your have built ...

if you could help spotting the issue with changes

The bug is in GlobalizationNative_GetSortHandle I think.

@ghost
Copy link
Author

ghost commented Jun 24, 2018

@jkotas, simple cases are working. I am not sure which case will make it fail locally. I have tried with:

using System;
using System.Globalization;
using System.Threading;

public class Example
{
   public static void Main()
   {
      CultureInfo culture1 = CultureInfo.CurrentCulture;
      CultureInfo culture2 = Thread.CurrentThread.CurrentCulture;
      Console.WriteLine("The current culture is {0}", culture1.Name);
      Console.WriteLine("The two CultureInfo objects are equal: {0}",
                        culture1 == culture2);

      var val = 222;
      Console.WriteLine($">> {val}");

      GregorianCalendar calendar = new GregorianCalendar();
      Console.WriteLine(calendar.GetMonthsInYear(11));

      Console.WriteLine("France".Equals(new RegionInfo("FR").NativeName));

      string guidString =  "00000001-57ee-1e5c-00b4-d0000bb1e11e";
      SortVersion version = new SortVersion(393742, new Guid(guidString));
      Console.WriteLine(393742 == version.FullVersion);
   }
}

if ((*ppSortHandle) == nullptr)
assert(ppSortHandle != NULL);

*ppSortHandle = (SortHandle*)malloc(sizeof(SortHandle*));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be malloc(sizeof(SortHandle)). Similar bug is in other places too.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the main bug. Thanks, now it's all green.

} SortHandle;
const TCollatorMap* leftMap = left;
const TCollatorMap* rightMap = right;
return leftMap->key - rightMap->key;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can return wrong result in a general case. (Not sure for the range of keys that are possible here.)

Change this to correct by construction pattern?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is implemented correctly and in accord with the documentation https://linux.die.net/man/3/tsearch. No issues found so far.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E.g. when left key is 0x7FFFFFFF and the right key key is -1, this will return incorrect result.

The correct way to implement this looks like this: https://github.com/dotnet/coreclr/blob/master/src/System.Private.CoreLib/shared/System/Int32.cs#L51

This issue causes corner case bugs that are not easy to find.

UChar* ptr = (UChar*)realloc(list->items, *currentCapacity * sizeof(UChar*));
if(ptr == NULL)
{
assert(false && "realloc unable to allocate memory.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not assert on OOM.

@jkotas
Copy link
Member

jkotas commented Jun 24, 2018

simple cases are working

Are you sure you are running the binary with your changes? I cannot believe that it would work by accident, even for simple cases like this.

int32_t length = uloc_getLanguage(locale, nullptr, 0, &status);
int32_t length = uloc_getLanguage(locale, NULL, 0, &status) + 1;

char* buf = calloc(length, sizeof(char*));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The allocated length is off by one here and in the usage below.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@janvorli, like i've commented twice above that +1 is added in previous statement..

int32_t length = uloc_getCountry(locale, nullptr, 0, &status);
int32_t length = uloc_getCountry(locale, NULL, 0, &status) + 1;

char* buf = calloc(length, sizeof(char*));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again off by one allocation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, see previous like 136

set_target_properties(System.Globalization.Native PROPERTIES PREFIX "")

# Force CMake to not link against libstdc++
# http://cmake.3232098.n2.nabble.com/setting-LINKER-LANGUAGE-still-adds-lstdc-td7581940.html
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem that the URL applies to our case. It describes the case when a target contained cpp files. We now have all the files in "C", so the linker should recognize it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you try it? This is the only option that works.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to remove the changes here and it has built fine on my machine. Maybe you didn't do clean build and something has leaked from the previous state? Just try to deleted the whole bin folder before the build.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've used cmake 3.5.1 and clang 3.9.1

Copy link
Author

@ghost ghost Jun 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wasn't about the build failure, if you follow the comments above (#18614 (comment)), it was found with ldd System.Globalization.Native.so that we are still linking against libstdc++. With this change it fixed that issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I am sorry, I've missed that. But I've found the real culprit. The problem is that we are including one .cpp file in the build of the library - the version.cpp that is compiled into all targets to provide info on the version of the library. This is automatically added to all targets, so that's why you have not noticed it. See the _add_library function in functions.cmake.
Modifying the generator of version.cpp to generate file named version.c instead and updating the VERSION_FILE_PATH variable in the root CMakeLists.txt accordingly fixes the problem..

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks. I have renamed version.cpp to version.c and removed these hacks. Checked with ldd no libstdc++.


if (U_FAILURE(err))
{
ucal_close(pCal);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of failure, the pCal should not be closed (see the original holder). That's why the holder gets the err as argument too.
The same at many other places below.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is ok. The ICU close methods deal with null pointers fine. Their examples use similar patterns. Also I have checked the ICU sources and they indeed handle null pointers fine.

@janvorli
Copy link
Member

Just a quick thought - I wonder if we could use the __attribute__((cleanup(cleanup_function))) for locals that need cleanup instead of relying on sprinkling the freeing or closing functions at right places in the code manually. @marek-safar do you if that would work for Mono? It seems that it is supported since gcc 3.3.6.

@jkotas
Copy link
Member

jkotas commented Jun 25, 2018

I do not think we should be using non-standard attributes. My strong preference for doing this sort of cleanup is goto done or goto error pattern that we are using in number of other places, e.g.:

https://github.com/dotnet/corefx/blob/master/src/Native/Unix/System.Security.Cryptography.Native/pal_ecc_import_export.c#L135
https://github.com/dotnet/corefx/blob/master/src/Native/Unix/System.Native/pal_process.c#L264

It is the most readable, portable and robust pattern among the different options.

@ghost
Copy link
Author

ghost commented Jun 25, 2018

Are you sure you are running the binary with your changes?

Yes.

@jkotas
Copy link
Member

jkotas commented Jun 27, 2018

I do not see any obvious bug by just looking at the code.

@ghost
Copy link
Author

ghost commented Jun 29, 2018

Type casting UChar (uint_least16_t) to char16_t shows the correct values (e.g. print (char16_t*)someUcharPtr).
I have tried debugging with SOS and lldb, but couldn't successfully set breakpoints in managed code (even with libsos plugin), only native ones are working. However, if someone has better tooling to test the branch, it would help.

@janvorli
Copy link
Member

@kasper3, what was the syntax of the bpmd command you have tried to use?

@ghost
Copy link
Author

ghost commented Jun 29, 2018

@janvorli, I was trying withsos bpmd /testapp/bin/Debug/netcoreapp2.1/linux-x64/publish/System.Private.CoreLib.dll System.Globalization.CultureInfo.GetCultureByName and this is the full story:

dotnet new console -n /testapp
cd testapp
cat >Program.cs <<EOF
using System.Globalization;
static class Program
{
    static void Main(string [] args)
    {
        new CultureInfo("ja-JP") { DateTimeFormat = { Calendar = new JapaneseCalendar() } };
    }
}
EOF
dotnet publish -r linux-x64

cp \
  /coreclr/bin/Product/Linux.x64.Debug/System.Globalization.Native.so \
  /coreclr/bin/Product/Linux.x64.Debug/libcoreclr.so                  \
  /coreclr/bin/Product/Linux.x64.Debug/libclrjit.so                   \
  /coreclr/bin/Product/Linux.x64.Debug/libmscordaccore.so             \
  /coreclr/bin/Product/Linux.x64.Debug/PDB/System.Private.CoreLib.pdb \
  /coreclr/bin/Product/Linux.x64.Debug/System.Private.CoreLib.dll     \
  /testapp/bin/Debug/netcoreapp2.1/linux-x64/publish/

lldb bin/Debug/netcoreapp2.1/linux-x64/publish/testapp
(lldb) target create "bin/Debug/netcoreapp2.1/linux-x64/publish/testapp"
Current executable set to 'bin/Debug/netcoreapp2.1/linux-x64/publish/testapp' (x86_64).
(lldb) plugin load /coreclr/bin/Product/Linux.x64.Debug/libsosplugin.so
(lldb) sos bpmd /testapp/bin/Debug/netcoreapp2.1/linux-x64/publish/System.Private.CoreLib.dll System.Globalization.CultureInfo.GetCultureByName
Failed to load data access DLL, 0x80131c4f
You can run the debugger command 'setclrpath' to control the load of libmscordaccore.so.
If that succeeds, the SOS command should work on retry.
bpmd /testapp/bin/Debug/netcoreapp2.1/linux-x64/publish/System.Private.CoreLib.dll System.Globalization.CultureInfo.GetCultureByName  failed

though I couldn't quite figure out how to set the line number with bpmd in managed code. For example in lldb, usually the native code break set -f path/file.c -l 123 works and gdb has even slimmer syntax: break path/file.c:123.

@janvorli
Copy link
Member

Couple of thoughts:

  • You are missing copying libsos.so to the target folder. This is likely causing the failure to load the data access dll. In general, I'd recommend copying everything *.so and *.dll there.
  • I always use the System.Private.CoreLib.dll in the bpmd command without the path (not sure if having the path there hurts or not)
  • A nit - you don't need to use the sos in front of the bpmd, but it doesn't hurt.

Copy link
Member

@janvorli janvorli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did one more pass to see if I can spot something suspicious that would cause the japanese culture test to fail. I have not found anything like this, but I've found few nits and details.

const UChar* localeRules = ucol_getRules(pCollator, &localeRulesLength);
int32_t completeRulesLength = localeRulesLength + customRuleLength + 1;

UChar* completeRules = calloc(completeRulesLength, sizeof(UChar));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems you are missing deletion of the completeRules in this function

delete (*ppSortHandle);
(*ppSortHandle) = nullptr;
free(*ppSortHandle);
(*ppSortHandle) = NULL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see that the original code was not doing that, but we are missing pthread_mutex_destroy call here - the CreateSortHandle has created the mutex.

|| (0xff61 <= character && character <= 0xff65);
}

static bool AddItem(UCharList* list, size_t* currentCapacity, const UChar item)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nit - it seems the currentCapacity could be part of the UCharList too. It seems a bit strange that the size is part of it and the capacity is not.

@ghost
Copy link
Author

ghost commented Jun 30, 2018

@janvorli, is there a way to set a breakpoint (or tracepoint) by line number with bpmd?

@ghost
Copy link
Author

ghost commented Jul 1, 2018

@dotnet-bot test Ubuntu x64 Checked corefx_baseline

@ghost
Copy link
Author

ghost commented Jul 1, 2018

Japanese calendar problem also appeared while porting partial changes to C++ for #18733. It was the enum alignment. It is interesting that we can't use typedef enum __attribute((packed, aligned(2))) { ... } NAME to simulate C++ style derivation enum NAME : int16_t { ... }, because of what appears to be a gcc and clang bug: https://stackoverflow.com/a/33293280.

@jkotas, this PR is back in business; all green.

@jkotas
Copy link
Member

jkotas commented Jul 2, 2018

@krwq @tarekgh Does this look good to you?

  • Makes the globalization shim more portable (enables sharing it with Mono)
  • Smaller binary
  • Fixes subtle error handling bug (the current code is using C++ exception handling to handle errors like out of memory. If these error paths are ever hit, the C++ exceptions will propagate over PInvoke boundary that is not supported on Unix

@jkotas jkotas requested review from krwq and tarekgh July 2, 2018 00:13
int symbolLen = udat_getSymbols(pFormat, type, i, NULL, 0, &ignore) + 1;

std::vector<UChar> symbolBuf(symbolLen + 1, '\0');
UChar* symbolBuf = calloc(symbolLen, sizeof(UChar));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UChar* symbolBuf = calloc(symbolLen, sizeof(UChar)); [](start = 8, length = 52)

can we optimize this, instead of allocating inside every time inside the loop, we can just allocate once outside (maybe with 100 UChar) and then check inside the loop if we need to resize this buffer if needed. in most cases I wouldn't expect we'll resize anyway

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Current approach is to provide 1:1 replacement for existing code and reap the performance benefit of reduced code size of C vs. C++. If we need additional performance tweaks, that can be done separately?

UResourceBundle* calResBundle = ures_getByKey(rootResBundle, "calendar", NULL, &status);
UResourceBundle* targetCalResBundle = ures_getByKey(calResBundle, name, NULL, &status);
UResourceBundle* erasColResBundle = ures_getByKey(targetCalResBundle, "eras", NULL, &status);
UResourceBundle* erasResBundle = ures_getByKey(erasColResBundle, "narrow", NULL, &status);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this logic work fine? I mean the status values would be accumulated and note overridden? for example if ures_open failed and then some how the other subsequent calls succeed for any reason which will mark status as success.

would it be safer to check the status after every call? and don't have to continue of one failed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic works and it is actually the suggested error handling pattern if you look at ICU samples. We can use it in more places if we wanted it to make the code smaller and a bit faster. I agree that it looks like a suspect outlier now since nothing else in the file is using the ICU error handling pattern. There should be either comment, or it should use the same error handling pattern like the rest of the file.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use it in more places if we wanted it to make the code smaller and a bit faster

Yes @jkotas, that was your comment and I cleaned it up from all places. Do you have any other place in mind which can be cleaned up?

Copy link
Member

@jkotas jkotas Jul 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have commented about the ICU style error handling, but there was not any cleanup done to use it. The only place that uses the ICU style error handling in the change right now is this one.

You can read more about the design of the ICU error handling here: http://userguide.icu-project.org/design#TOC-Error-Handling. Here is one example from many where the ICU style error handling can be used to make the code smaller and a bit faster:

int32_t GlobalizationNative_GetLatestJapaneseEra()
{
    UErrorCode err = U_ZERO_ERROR;
    UCalendar* pCal = ucal_open(NULL, 0, JAPANESE_LOCALE_AND_CALENDAR, UCAL_TRADITIONAL, &err);
    int32_t ret = ucal_getLimit(pCal, UCAL_ERA, UCAL_MAXIMUM, &err);
    ucal_close(pCal);
    return U_SUCCESS(err) ? ret : 0;
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there was not any cleanup done to use it.

@jkotas, are you sure about that or am I missing something: a8ebe29?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you want to address it as part of this PR? I thought this pull request is approved..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs one of the following:

  • Change the error handling style here to match the style used in the rest of the change
  • Or change the error handling style throughout the change to use ICU-style

Copy link
Member

@jkotas jkotas Jul 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this pull request is approved.

I have approved it because of I did not have any more comments. However, I have looked at this change multiple times and I knew that I am likely missing things (it is "reviewers fatigue") since this change had many bugs and comments already. It is why I have asked more folks for review.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant is there are many comments in this PR thread. It is hard for me to follow as well. I can do a follow up PR to address the rest of comments from green point marker onwards.


Here is one example from many where the ICU style error handling can be used to make the code smaller and a bit faster:

is it about removing:

if (U_FAILURE(err))
    return 0;

after *_open APIs calls?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the if (U_FAILURE(err)) checks are not needed in some cases after *_open calls thanks to the ICU error handling model. http://userguide.icu-project.org/design#TOC-Error-Handling.

I can do a follow up PR to address the rest of comments from green point marker onwards.

sounds good.

UCollator* pClonedCollator;
std::vector<UChar> customRules = GetCustomRules(options, strength, isIgnoreSymbols);
if (customRules.empty())
UCharList* customRules = GetCustomRules(options, strength, isIgnoreSymbols);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UCharList* customRules = GetCustomRules(options, strength, isIgnoreSymbols); [](start = 4, length = 76)

This maybe ok but I am wondering if this is kind of behavior change? I mean before if we had out of memory exception in C++ we wouldn't continue coming to this code. but now we are kind of ignoring the out of memory and just continue without using any of custom rules.
I just thinking loud what is the best we can do here? should we ignore the OOM or we should fail the whole call?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can change the signature from:

UCharList*      GetCustomRules(int32_t, UColAttributeValue, bool)

to:

RESULT_CODE     GetCustomRules(int32_t, UColAttributeValue, bool, UCharList**)

and return U_MEMORY_ALLOCATION_ERROR like we are doing elsewhere; including this file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be reasonable

const UChar* localeRules = ucol_getRules(pCollator, &localeRulesLength);
int32_t completeRulesLength = localeRulesLength + customRuleLength + 1;

UChar* completeRules = calloc(completeRulesLength, sizeof(UChar));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

completeRules [](start = 15, length = 13)

we should check if the allocation succeeded after calling calloc

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

}

extern "C" ResultCode GlobalizationNative_GetSortHandle(const char* lpLocaleName, SortHandle** ppSortHandle)
void CreateSortHandle(SortHandle** ppSortHandle)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CreateSortHandle [](start = 5, length = 16)

what is the benefit making this a method by itself and not part of GlobalizationNative_GetSortHandle?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ResultCode GlobalizationNative_GetSortHandle(const char* lpLocaleName, SortHandle** ppSortHandle)

is on line 373

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was @janvorli's suggestion to make it a separate method in order to keep the initialization parts of sortHandle in one place.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong feeling to change it but this initialization is not used anywhere else anyway and also when something fails we clean up in GlobalizationNative_GetSortHandle as we don't have a clean up method matching the allocation method.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't have a clean up method matching the allocation method

we have GlobalizationNative_CloseSortHandle, that does the cleanup

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean, in CreateSortHandle we allocate the memory and then initialize the mutex there. and then in case of the failure, we free the mutex and memory in GlobalizationNative_GetSortHandle which is weird for me. my point is, either to have everything encapsulated inside GlobalizationNative_GetSortHandle or you create CreateSortHandle and DestroySortHandle kind.

As I mentioned I am not strongly feeling about it so I'll leave it to you to decide if you need to change it or keep the current code.


TCollatorMap::iterator entry = pSortHandle->collatorsPerOption.find(options);
if (entry == pSortHandle->collatorsPerOption.end())
TCollatorMap* map = (TCollatorMap*)malloc(sizeof(TCollatorMap));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

map [](start = 22, length = 3)

we should check if the allocation succeeded here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

if (isNegative && !minusAdded)
{
size_t length = (iEnd - iStart) + 2;
destPattern = calloc(length, sizeof(char));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

calloc [](start = 22, length = 6)

should check for allocation

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

assert(U_SUCCESS(err));

std::string normalizedPattern = NormalizeNumericPattern(icuPattern.data(), isNegative);
char* normalizedPattern = NormalizeNumericPattern(icuPattern, isNegative);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NormalizeNumericPattern [](start = 30, length = 23)

Maybe we can male this method return the length as we already know it there and avoid calling strlen after

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will do that.

*/
typedef enum
{
C = 0x1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C [](start = 4, length = 1)

nit: Maybe naming these to FormC, FormD... would be better.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, not something that is diverging from master

C = 0x1,
D = 0x2,
KC = 0x5,
KD = 0x6

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is diverging from master. The C++ code in master is using the C++ style enum class for this. It means that you qualify the enum value using the enum name, like NormalizationForm::C. C without qualification does not compile.

You have change it to regular enum. It means that the references to the enum values are just C, D, etc. and it is not possible to qualify them. The standard naming convention for C enum is to prefix all value with the enum name or something close to it. Take a look at the naming pattern used for C enums in corefx shims.

@ghost ghost closed this Jul 2, 2018
@ghost ghost deleted the cpp-toc-globalization branch July 2, 2018 19:12
@tarekgh
Copy link
Member

tarekgh commented Jul 2, 2018

@kasper3 why did you close this PR? I think you were close to finish it

@jkotas
Copy link
Member

jkotas commented Jul 2, 2018

There are too many comments here and it is hard to navigate them. @kasper3 will open follow up PR: #18614 (comment)

This pull request was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants