I have 3 CSV files, I am looping through that to parse data.
So basically I loop through 3 CSV files to make Key Value pairs to make another resultant CSV File.
I want to remove duplicates from that file and duplicates are based on one of the keys
So let me give you a code snippet
for master in master_dict_list:
for data in data_gen_dict_list:
for account in account_list:
if(float(master[account['Target Model']])>0):
deviation = float((random.randint(-int(data['Security deviation %']), int(data['Security deviation %']))))/100
security_weight = float(master[account['Target Model']]) + (float(master[account['Target Model']])*deviation)
security_market_value = float(account['Market Value'])*security_weight/100
shares = round(security_market_value/float(master['Price']))
positions_list.append({
'Account Code' : account['Account Code'],
'Security ID': master['SecurityID'],
'Shares' : shares,
'Security Market Value' : security_market_value,
'Security % Weight' : str(security_weight),
'Security Price' : master['Price']
})
Due to data_gen_dict_list, which has 2 entries, there are duplicates.
I want to remove duplicates where security id and account numbers are the same
Here is what a duplicate could look like
account no : 11 security id : abcd shares : 1 smv : 22
acounnt no : 11 security id : abcd shares : 2 smv : 23
I have 3 CSV files, I am looping through that to parse data.
So basically I loop through 3 CSV files to make Key Value pairs to make another resultant CSV File.
I want to remove duplicates from that file and duplicates are based on one of the keys
So let me give you a code snippet
for master in master_dict_list:
for data in data_gen_dict_list:
for account in account_list:
if(float(master[account['Target Model']])>0):
deviation = float((random.randint(-int(data['Security deviation %']), int(data['Security deviation %']))))/100
security_weight = float(master[account['Target Model']]) + (float(master[account['Target Model']])*deviation)
security_market_value = float(account['Market Value'])*security_weight/100
shares = round(security_market_value/float(master['Price']))
positions_list.append({
'Account Code' : account['Account Code'],
'Security ID': master['SecurityID'],
'Shares' : shares,
'Security Market Value' : security_market_value,
'Security % Weight' : str(security_weight),
'Security Price' : master['Price']
})
Due to data_gen_dict_list, which has 2 entries, there are duplicates.
I want to remove duplicates where security id and account numbers are the same
Here is what a duplicate could look like
account no : 11 security id : abcd shares : 1 smv : 22
acounnt no : 11 security id : abcd shares : 2 smv : 23
Ehhh i'm not a python guy xD i wouldn't know how to give you a code snippet but a super dumb solution could be storing the AccountCode & SecurityID as keys in an object ( or idk if in python there's a better data structure to do this ) and before you push a new element to this position_list you have you check if this key is found in this object you declared
in js i would do something like
JavaScript:
//support variable
const csvKeys = {}
for(.....)
for(.....)
for(.....)
const checkForDuplicateskey = `${account["Account Code"]}-${master.SecurityID]}`;
//if we DON'T find the key inside the support variable we're good to go
if (!csvKeys [checkForDuplicateskey ]) {
csvKeys [checkForDuplicateskey ] = true;
positions_list.push(YourStuff);
}
so assuming we have these 2 elements
account no : 11 security id : abcd shares : 1 smv : 22
acounnt no : 11 security id : abcd shares : 2 smv : 23
during the first loop ( smv:22 )
csvKeys it's empty
so we enter inside the IF
csvKeys becomes
csvKeys = {
"11-abcd" : true
}
during the second loop checkForDuplicateskey will become 11-abcd it will be found inside our csvKeys object and the duplicate ( smv: 23 ) won't be added
Oh maybe I didn't give instructions properly, neither of the lists that I am looping over, master_dict_list, data_gen_dict_list or accounts_list have duplicates.
The resultant list positions_list that I am appending is getting duplicates because data_gen_dict_list runs for 2 iterations as it has two entires (Ordered pairs).
The positions_list has 30,000 entries but it should have 150000 because for each account (there are 15000) accounts, there are two duplicates securites.
I want to remove elements from positions_list where account_no is same and securities is also same as one account can not have two securites with same value.
but
That's the point of the logic, not appending and element with the same account code and security id twice
btw 4 for loops ? TF ?
Basically before appending an element to the list we check if that element was already appended before
i don't know which data structure is more appropriate in python probably a dictionary but let's simplify it and use an array of strings cuz it's the most simple
so before the 3 for lops we declare a variable which is an empty array
// here we're gonna store a list of strings where each string is a combination of the account code and security id pairs
alreadyAppendedElements = list()
so let's assume we ran a couple of iterations and we print it it might look something like this
["11-abc","12-fgh","13-etc",.......]
also before you do the if(float(master.......) we save the Account Code and SecurityID in 2 variables ( actually we gonna use 3 variables )
accountCode = account["Account Code"]
securityID = master["SecurityID"]
//this one is important
duplicateKey = accountCode + "-"+securityID
and basically before appending we check if this duplicateKey is inside our alreadyAppendedElements so now you can for example do an and in the if like this
if(float(master....... and duplicateKey not in alreadyAppendedElements)
//rest of the code
and after you append your stuff to position_list we make sure we also add the duplicateKey to the alreadyAppendedElements
basically this way when you are populating your position_list you are making sure that the same account code and security id pair has not been appended already
but
That's the point of the logic, not appending and element with the same account code and security id twice
btw 4 for loops ? TF ?
Basically before appending an element to the list we check if that element was already appended before
i don't know which data structure is more appropriate in python probably a dictionary but let's simplify it and use an array of strings cuz it's the most simple
so before the 3 for lops we declare a variable which is an empty array
// here we're gonna store a list of strings where each string is a combination of the account code and security id pairs
alreadyAppendedElements = list()
so let's assume we ran a couple of iterations and we print it it might look something like this
["11-abc","12-fgh","13-etc",.......]
also before you do the if(float(master.......) we save the Account Code and SecurityID in 2 variables ( actually we gonna use 3 variables )
accountCode = account["Account Code"]
securityID = master["SecurityID"]
//this one is important
duplicateKey = accountCode + "-"+securityID
and basically before appending we check if this duplicateKey is inside our alreadyAppendedElements so now you can for example do an and in the if like this
if(float(master....... and duplicateKey not in alreadyAppendedElements)
//rest of the code
and after you append your stuff to position_list we make sure we also add the duplicateKey to the alreadyAppendedElements
basically this way when you are populating your position_list you are making sure that the same account code and security id pair has not been appended already
I am not that good with python but I would have done as @U c 4 up da idly said, sort 30000 elements and then create a new array with 15000 elements to discard the duplicates, sort of like this way =>
S = ['a','b','b','a','c','e','f','f','c','d','g','g','d','e']
S.sort()
new_array = []
temp = S[0]
new_array.append(temp)
for i in range(len(S)):
if S != temp: ## single tab
new_array.append(S) ## double tab
temp = S ## single tab
Sorry, the post doesnt show the indentations in the loop that I can see in the editor
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.