"Rare words" on vocabulary

ntonis asked Jan 30, 2021

712 views

I am trying to create a sentiment analysis model and I have a question.

After I preprocessed my tweets and created my vocabulary I've noticed that I have words that appear less than 5 times in my dataset (Also there are many of them that appear 1 time). Many of them are real words and not gibberish. My thinking is that if I keep those words then they will get wrong "sentimental" weights and gonna make my model worse.
Is my thinking right or am I missing something?

My vocab size is around 40000 words and those that are "rare" are around 10k.Should I "sacrifice" them?

Thanks in advance.

ntonis

170 points

Please log in or register to answer this question.

Related questions

0 0 votes

0 0 answers

723

723 views

ntonis asked Jan 30, 2021

723 views

Binary Classification and neutral tag

I am trying to create a sentiment analysis model using binary classification as loss.I have a batch of tweets that some of them are tagged as positive (labeled as 1) and ...

ntonis

170 points

ntonis asked Jan 30, 2021

2 2 votes

1 1 answer

883

883 views

codemonkey asked Oct 16, 2018

883 views

How to perform sentiment analysis in NLP?

If trying to read text and need to finalize texts as good, bad , ugly or any such buckets, where to start? What sentiment functions to use?

codemonkey

140 points

codemonkey asked Oct 16, 2018

5 5 votes

1 answers 1 answer

10.3k

10.3k views

tofighi asked Jun 26, 2019

10,277 views

How to calculate convolutions on a CONV layer for a Convolutional Neural Network?

Assume we have a $5\times5$ px RGB image with 3 channels respectively for R, G, and B. IfR2000012001201021210101020G0212211100002202002002111B0100111201102021011012112 We...

tofighi

116k points

tofighi asked Jun 26, 2019

0 0 votes

0 0 answers

728

728 views

HbibOs asked Jun 21, 2021

728 views

how many samples do we need to test image segmentation using synthetic data ?

Hello,I trained a CNN using synthetic data to perform a segmentation task on human faces. During the test and to evaluate the prediction of this network, I used 200 examp...

HbibOs

120 points

HbibOs asked Jun 21, 2021

1 1 vote

0 0 answers

1.2k

1.2k views

saugata28 asked Jun 8, 2019

1,191 views

What loss function to use in CNN-SVM model

I am using Matlab R2018b and am trying to infuse SVM classifier within CNN. My plan is to use CNN only as a feature extractor and use SVM as the classifier. I know people...

saugata28

130 points

saugata28 asked Jun 8, 2019

"Rare words" on vocabulary

Please log in or register to add a comment.

Please log in or register to answer this question.

0 Answers

Related questions

0 reply

Please log in or register to add a comment.

Please log in or register to answer this question.

0 Answers

Related questions

0