handling large table. #336
              
                Unanswered
              
          
                  
                    
                      February24-Lee
                    
                  
                
                  asked this question in
                Q&A
              
            Replies: 1 comment 3 replies
-
| 
         I think one trick is to merge all the text columns and produce only 1 embedding from the merged columns. These are my personal thoughts. Any comments are welcomed.  | 
  
Beta Was this translation helpful? Give feedback.
                  
                    3 replies
                  
                
            
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
        
    
Uh oh!
There was an error while loading. Please reload this page.
-
Hello, is there an efficient method or future development idea for handling large tables?
In my case, the table size is 1,442,792 x 12 with 2 categorical, 4 numerical, and 4 embedded-text columns. I cached all of the text with the OpenAI embedding API, so the dimension of each text is almost 1,500. The problem is that it consumes too much memory and it kills the process early (in fact, I found this was my mistake). It was inconvenient when conducting repetitive short experiments.
So, I modified the dataset and loader to execute convert_to_tensor_frame when called. Here is my code: https://github.com/February24-Lee/pytorch-frame/pull/1/files I thought this method was the simplest and required fewer modifications, though not fancy.
Anyway, I'm curious about your thoughts or future plans regarding handling large tables like mine.
Thank you.
Beta Was this translation helpful? Give feedback.
All reactions