I have a tensorflow LSTM model for predicting the sentiment. I build the model with the maximum sequence length 150. (Maximum number of words) While making predictions, i have written the code as below:
batchSize = 32
maxSeqLength = 150
def getSentenceMatrix(sentence):
arr = np.zeros([batchSize, maxSeqLength])
sentenceMatrix = np.zeros([batchSize,maxSeqLength], dtype='int32')
cleanedSentence = cleanSentences(sentence)
cleanedSentence = ' '.join(cleanedSentence.split()[:150])
split = cleanedSentence.split()
for indexCounter,word in enumerate(split):
try:
sentenceMatrix[0,indexCounter] = wordsList.index(word)
except ValueError:
sentenceMatrix[0,indexCounter] = 399999 #Vector for unkown words
return sentenceMatrix
input_text = "example data"
inputMatrix = getSentenceMatrix(input_text)
In the code i'm truncating my input text to 150 words and ignoring remaining data.Due to this my predictions are wrong.
cleanedSentence = ' '.join(cleanedSentence.split()[:150])
I know that if we have lesser length than sequence length we can pad with zero's. What we need to do if we have more length. Can you suggest me the best way to do this. Thanks in advance.