Skip to content

[RFC] regex-capi: expose regex::escape #537

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

marmeladema
Copy link
Contributor

I needed regex::escape available in the C API so i added it, but i have only done a few hours of rust in my life so this patch might be totally wrong. I have successfully been able to rust-regex-escape some strings from Python using ctypes to load the library though.

I followed this guide to handle String as returned type: http://jakegoulding.com/rust-ffi-omnibus/string_return/

Copy link
Member

@BurntSushi BurntSushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this would be a nice addition! I think there are a few small issues with the implementation in this PR that need to be addressed before merging though.

@marmeladema
Copy link
Contributor Author

Sorry to use this PR as a way to learn more about rust but i don't quite understand something.
Before using rure_error, i wanted just to return NULL instead of unwrap so i modified the actual code:

ffi_fn! {
    fn rure_escape(pattern: *const u8, length: size_t) -> *const c_char {
        let pat : &[u8] = unsafe{ slice::from_raw_parts(pattern, length) };
        let str_pat = str::from_utf8(pat).unwrap();
        let esc_pat = regex::escape(str_pat);
        let c_esc_pat = CString::new(esc_pat).unwrap();
        c_esc_pat.into_raw()
    }
}

to

ffi_fn! {
    fn rure_escape(pattern: *const u8, length: size_t) -> *const c_char {
        let pat : &[u8] = unsafe{ slice::from_raw_parts(pattern, length) };
        let str_pat = match str::from_utf8(pat) {
            Ok(val)  => val,
            Err(_) => return std::ptr::null(),
        };
        let esc_pat = regex::escape(str_pat);
        let c_esc_pat = match CString::new(esc_pat) {
            Ok(val)  => val,
            Err(_) => return std::ptr::null(),
        };
        c_esc_pat.into_raw()
    }
}

but i get an error on the last line saying that

   Compiling rure v0.2.0 (/home/adema/code/regex/regex-capi)                                                                                                                                                                                                   
error[E0308]: mismatched types                                                                                                                                                                                                                                 
   --> regex-capi/src/rure.rs:586:9                                                                                                                                                                                                                            
    |                                                                                                                                                                                                                                                          
586 |         c_esc_pat.into_raw()                                                                                                                                                                                                                             
    |         ^^^^^^^^^^^^^^^^^^^^ types differ in mutability                                                                                                                                                                                                  
    |                                                                                                                                                                                                                                                          
    = note: expected type `*const _`                                                                                                                                                                                                                           
               found type `*mut i8`                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                               
error: aborting due to previous error                                                                                                                                                                                                                          

Why suddenly the compiler is not able to infer the correct type ?
I had to force it with
c_esc_pat.into_raw() as *const c_char

@BurntSushi
Copy link
Member

Good question. I don't know off the top of my head. It might just be an inference failure.

@BurntSushi
Copy link
Member

@marmeladema I can't seem to reproduce your problem: https://play.rust-lang.org/?version=stable&mode=debug&edition=2015&gist=964bf197a31f7342fcf79df66163a45d

Which version of Rust are you using?

@marmeladema
Copy link
Contributor Author

marmeladema commented Nov 19, 2018

rustc 1.30.0 (da5f414c2 2018-10-24)
cargo 1.30.0 (36d96825d 2018-10-24)

Same error with:
rustc 1.30.1 (1433507eb 2018-11-07)
cargo 1.30.0 (a1a4ad372 2018-11-02)

And also with current nightly:
rustc 1.32.0-nightly (13c943992 2018-11-18)
cargo 1.32.0-nightly (b3d0b2e54 2018-11-15)

@BurntSushi
Copy link
Member

BurntSushi commented Nov 19, 2018

Dunno then. Can you reproduce it on the playground? If not, it might be the ffi_fn macro which does other stuff with the function body, including using it in places where coercion might not apply. But I don't know. I don't have time to dig into the details of automatic coercions right now. :-) If I had to guess, you might be able to consider this a bug/inference failure. But I'm not sure.

@marmeladema
Copy link
Contributor Author

PR updated with rure_escape and rure_escape_must and a test.
I had to introduce another variant of ErrorKind for std::ffi::NulError

Copy link
Member

@BurntSushi BurntSushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm. Apologies for not catching this earlier, but reviewing this again, I'm realizing that while rure_escape_must probably looks OK, I think rure_escape is not quite right since pattern is actually allowed to contain a NUL byte. It's a bit of a corner case, but it's technically legal, so we should get it right. That means returning a const uint8_t * along with a length. So the function signature probably wants to be:

void rure_escape(const uint8_t *pattern, size_t length, const uint8_t **escaped, size_t *escaped_len, rure_error *error);

or something like that anyway.

With that said, I would be more than happy to accept a PR that dropped rure_escape and just had rure_escape_must. Sorry again for the run-around!

@marmeladema
Copy link
Contributor Author

I have an updated version working with prototype
const uint8_t *rure_escape(const uint8_t *pattern, size_t length, size_t *escaped_len, rure_error *error);
but before i amend my commit, i just want to understand fully the edge case here.
The only case where regex::escape would return a string containing a null byte is when it is already present in the input pattern ? Because i don't see any way of escaping something that would become a null byte.
If i am right, it means that:

  1. str::from_utf8 accepts bytes slice with null byte(s) inside
  2. regex::escape will not escapes input null bytes

I can understand 1., but 2. sounds weird to me for an escaping function. I understand its not necessary to escape null byte because String handle it properly, but it make a lot of things hard to deal with, like printing etc.

@marmeladema
Copy link
Contributor Author

@BurntSushi any news about this?

BurntSushi pushed a commit that referenced this pull request Mar 30, 2019
This commit exposes two new functions in regex's C API: rure_escape_must
and rure_cstring_free. These permit escaping a pattern such that it
contains no special regex meta characters.

Currently, we only expose a routine that will abort the process if it
fails, but we document the precise error conditions. A more flexible but
less convenient routine should ideally be exposed in the future, but
that needs a bit more API design than what's here.

Closes #537
@BurntSushi
Copy link
Member

I merged this in #567, after doing a bit of cleanup. I removed the rure_escape function and just kept rure_escape_must.

I'd be happy to accept another PR if you want to implement rure_escape, subject to my comments above. It should specifically permit NUL bytes without escaping them, but return an error on invalid UTF-8. The fact that interior NULs are difficult to deal with in C is not relevant, and it's why this function must use a const uint8_t * instead of a const char *.

Thanks for this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants