Skip to content

Conversation

@muellerzr
Copy link
Owner

No description provided.

@marcglobality
Copy link

Hi @muellerzr , thanks for this repo. What happened with this?

@muellerzr
Copy link
Owner Author

Time and motivation :) Wound up moving to other things so NLP never came

@muellerzr muellerzr closed this Mar 23, 2021
@marcglobality
Copy link

I understand by your actions that it will not come with this PR then? :D

@muellerzr
Copy link
Owner Author

Correct. I may reopen and update these notebooks, as they're good NLP foundational tutorial notebooks for fastai, but nothing beyond that is planned at this moment

@muellerzr muellerzr reopened this Mar 23, 2021
@marcglobality
Copy link

Fair enough. I was wondering how to predict on the test set. I know it's a bit abusing of you (and you don't need to answer fully, but maybe point me to the right place?)

What I manage to do (I simplified this for readability):

text_block = TextBlock.from_df(
        text_cols=INPUT_COLUMNS, 
        is_lm=False,
        seq_len=1_000,

        # add xxfld between fields
        mark_fields=True,

        # name for the output column
        tok_text_col='ulmfit_text',
)

data_cls = DataBlock(
        blocks=(text_block, CategoryBlock),
        get_x=ColReader("ulmfit_text"),
        get_y=ColReader("label"),  
    )

data_cls = data_cls.dataloaders(
        pd.concat([
            df_train.assign(is_dev=False), 
            df_dev.assign(is_dev=True),
        ]),
        shuffle_train=True,
        bs=32, 
        verbose=False,
        splitter=ColSplitter("is_dev"),
    )

learn = text_classifier_learner(
    data_cls, 
    AWD_LSTM, 
    drop_mult=0.5, 
    metrics=[Precision(), Recall()]
)
learn.load_encoder("1epoch_encoder")
learn.fit_one_cycle(1, 2e-2)

and it runs correctly in (I simplified this for readability)

text = "xxbos xxfld 1 ......... solicitors xxfld 2 xxunk xxunk xxunk xxfld 3 xxmaj gao xxmaj jia..."
learn.predict(text)

My problem now is, given a test_df, how can I predict on it (with the pipeline composing the fields)? Something like:

test_df = pd.concat( 
    (
        df_dev[df_dev.label.eq(1)].sample(10, random_state=88),
        df_dev[df_dev.label.eq(0)].sample(10, random_state=88),
    )
)
dl = learn.dls.test_dl(test_df)
_, __, preds = learn.get_preds(dl=dl)

@muellerzr
Copy link
Owner Author

IIRC you need to do proc_df or something along those lines. Have you searched in the forums? I know this was brought up in there (I think the v2 text thread)

@marctorsoc
Copy link

IIRC you need to do proc_df or something along those lines. Have you searched in the forums? I know this was brought up in there (I think the v2 text thread)

So I searched in the forums, and could arrive into a better version where at least it predicts something. But still seems like a hack, and I don't get the same metrics for dev when fitting than when using the dev set as a test set. Could you answer to me in there https://forums.fast.ai/t/predictions-for-the-test-set-are-they-correct/86994? Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants